bike_q1 <- read.csv("2015-Q1-Trips-History-Data.csv")
bike_q2 <- read.csv("2015-Q2-Trips-History-Data.csv")
bike_q3 <- read.csv("2015-Q3-cabi-trip-history-data.csv")
bike_q4 <- read.csv("2015-Q4-Trips-History-Data.csv")
weather <- read.csv('weather_2015.csv')
bike <-read.csv("Capital_Bike_Share_Locations.csv",header=TRUE, sep=",")
\section{1. Introduction}

In the last decade there has been increasing concern regarding the environment and the quality of life, especially in big cities. From increasing taxation to financial incentives, different approaches and public policies have been proposed and tested all around the world to address these concerns. In this scenario, shared cars and shared bicycles have became popular solutions in many cities to help mitigate traffic and environmental impact. How can these programs be set up for success?

\section{Research Question}

Due to the increasing importance and popularity of the Capital Bikeshare program, this project aims to: 1) identify the variables that most impact hourly ridership, and 2) develop a model to predict hourly bikeshare demand in the Greater Washington DC region based on historical ridership and weather data for 2015.

\section{2. Methods} \section{2.1. Dataset}

This will be a short description of the data sets, how the data is collected, causality, bias, etc

To conduct the analysis three datasets from 2015 were utilized: 1) Capital Bikeshare ridership, containing information for individual bike rentals throughout the year, 2) Wunderground dataset for daily DC weather information, and 3) Google Maps API location data.

\section{2.2.Data Wrangling, Cleaning, and Feature Engineering}

The anlaysis began by loading the R packages and the raw datasets for trip history and weather.

With the raw datasets loaded, the first challenge was to combine the various datasets.

The trip history datasets were addressed first. We reviewed the variables available in each set and adjusted them as necessary to ensure consistency across the separate datasets.

names(bike_q1)
## [1] "Total.duration..ms." "Start.date"          "Start.station"      
## [4] "End.date"            "End.station"         "Bike.number"        
## [7] "Subscription.Type"
names(bike_q2)
## [1] "Duration..ms."     "Start.date"        "Start.station"    
## [4] "End.date"          "End.station"       "Bike.number"      
## [7] "Subscription.type"
names(bike_q3)
## [1] "Duration..ms."        "Start.date"           "End.date"            
## [4] "Start.station.number" "Start.station"        "End.station.number"  
## [7] "End.station"          "Bike.."               "Member.type"
names(bike_q4)
## [1] "Duration..ms."        "Start.date"           "End.date"            
## [4] "Start.station.number" "Start.station"        "End.station.number"  
## [7] "End.station"          "Bike.."               "Member.type"
bike_q3$Start.station.number <- NULL
bike_q3$End.station.number <- NULL
bike_q4$Start.station.number <- NULL
bike_q4$End.station.number <- NULL
names(bike_q2)[1]<-paste("Total.duration..ms.")
names(bike_q2)[2]<-paste("Start.date")
names(bike_q2)[3]<-paste("Start.station")
names(bike_q2)[4]<-paste("End.date")
names(bike_q2)[5]<-paste("End.station")
names(bike_q2)[6]<-paste("Bike.number")
names(bike_q2)[7]<-paste("Subscription.Type")
names(bike_q3)[1]<-paste("Total.duration..ms.")
names(bike_q3)[2]<-paste("Start.date")
names(bike_q3)[3]<-paste("End.date")
names(bike_q3)[4]<-paste("Start.station")
names(bike_q3)[5]<-paste("End.station")
names(bike_q3)[6]<-paste("Bike.number")
names(bike_q3)[7]<-paste("Subscription.Type")
names(bike_q4)[1]<-paste("Total.duration..ms.")
names(bike_q4)[2]<-paste("Start.date")
names(bike_q4)[3]<-paste("End.date")
names(bike_q4)[4]<-paste("Start.station")
names(bike_q4)[5]<-paste("End.station")
names(bike_q4)[6]<-paste("Bike.number")
names(bike_q4)[7]<-paste("Subscription.Type")
names(bike_q1)
## [1] "Total.duration..ms." "Start.date"          "Start.station"      
## [4] "End.date"            "End.station"         "Bike.number"        
## [7] "Subscription.Type"
names(bike_q2)
## [1] "Total.duration..ms." "Start.date"          "Start.station"      
## [4] "End.date"            "End.station"         "Bike.number"        
## [7] "Subscription.Type"
names(bike_q3)
## [1] "Total.duration..ms." "Start.date"          "End.date"           
## [4] "Start.station"       "End.station"         "Bike.number"        
## [7] "Subscription.Type"
names(bike_q4)
## [1] "Total.duration..ms." "Start.date"          "End.date"           
## [4] "Start.station"       "End.station"         "Bike.number"        
## [7] "Subscription.Type"

With variable consistency across the individual datasets, we were able to use row bind to combine Q1-Q4 of trip history data.

bike_df <- rbind(bike_q1, bike_q2)
bike_df <- rbind(bike_df, bike_q3)
bike_df <- rbind(bike_df, bike_q4)
dim(bike_df)
## [1] 3192908       7

Next, focused on combining the full 2015 ridership dataset with the full 2015 weather dataset. To do this, we had to identify a common variable between the datasets. We chose to use “date” in the ymd format. With the common variable in place in both datasets, we used a column bind to combine the ridership and weather datasets.

bike_df$Start.date<-mdy_hm(bike_df$Start.date)
bike_df$End.date<-mdy_hm(bike_df$End.date)
str(bike_df$Start.date)
##  POSIXct[1:3192908], format: "2015-01-01 00:02:00" "2015-01-01 00:02:00" ...
str(bike_df$End.date)
##  POSIXct[1:3192908], format: "2015-01-01 00:42:00" "2015-01-01 00:42:00" ...
bike_df$date<-date(bike_df$Start.date)
str(bike_df$date)
##  Date[1:3192908], format: "2015-01-01" "2015-01-01" "2015-01-01" "2015-01-01" ...
str(weather$EST)
##  Factor w/ 365 levels "2015-1-1","2015-1-10",..: 1 12 23 26 27 28 29 30 31 2 ...
weather$EST<-ymd(weather$EST)
str(weather$EST)
##  Date[1:365], format: "2015-01-01" "2015-01-02" "2015-01-03" "2015-01-04" ...
names(weather)[1]<-paste("date")
str(weather$date)
##  Date[1:365], format: "2015-01-01" "2015-01-02" "2015-01-03" "2015-01-04" ...
bike_weather <- merge(bike_df,weather,by="date")
dim(bike_weather)
## [1] 3192908      30
names(bike_weather)
##  [1] "date"                      "Total.duration..ms."      
##  [3] "Start.date"                "Start.station"            
##  [5] "End.date"                  "End.station"              
##  [7] "Bike.number"               "Subscription.Type"        
##  [9] "Max.TemperatureF"          "Mean.TemperatureF"        
## [11] "Min.TemperatureF"          "Max.Dew.PointF"           
## [13] "MeanDew.PointF"            "Min.DewpointF"            
## [15] "Max.Humidity"              "Mean.Humidity"            
## [17] "Min.Humidity"              "Max.Sea.Level.PressureIn" 
## [19] "Mean.Sea.Level.PressureIn" "Min.Sea.Level.PressureIn" 
## [21] "Max.VisibilityMiles"       "Mean.VisibilityMiles"     
## [23] "Min.VisibilityMiles"       "Max.Wind.SpeedMPH"        
## [25] "Mean.Wind.SpeedMPH"        "Max.Gust.SpeedMPH"        
## [27] "PrecipitationIn"           "CloudCover"               
## [29] "Events"                    "WindDirDegrees"

While the ridership data set provided some geographic information, we wanted a more robust set of geographic variables to be available, inclusive of things such as city, zip, etc. Therefore, we used revgeocode to extract additional geographic variables from the Google Maps API using the lat/long data that was included in the ridership dataset. The additional variables were combined to the merged ridership and weather dataset using row bind.

# Import dataset
bike<-read.csv("Capital_Bike_Share_Locations.csv",header=TRUE, sep=",")
# Extract gps info for each bike location
gps <- bike[c(3,5,6)]
# Extract bike full address from gps info
ad <- do.call(rbind,lapply(1:nrow(gps),function(i)revgeocode(as.numeric(gps[i,3:2])))) # Extract full address
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.858971,-77.05323&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.85725,-77.05332&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.856425,-77.049232&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.86017,-77.049593&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.857866,-77.05949&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.862303,-77.059936&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8637,-77.0633&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.857063,-77.051141&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8629,-77.0528&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.848441,-77.051516&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8426,-77.0502&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8533,-77.0498&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.850688,-77.05152&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9003,-77.0429&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9176,-77.0321&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.929464,-77.027822&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.926088,-77.036536&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.922925,-77.042581&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9268,-77.0322&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.923203,-77.047637&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9319,-77.0388&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8767,-77.0178&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90985,-77.034438&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.912682,-77.031681&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9086,-77.0323&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8963,-77.045&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9008,-77.047&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.936043,-77.024649&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9375,-77.0328&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9346,-76.9955&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90304,-77.019027&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8896,-76.9769&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9308,-77.0315&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8601,-76.9672&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.934267,-77.057979&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.878,-76.9607&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.897063,-76.947446&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.901385,-76.941877&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.862669,-76.994637&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.867373,-76.988039&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8952,-77.0436&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.919077,-77.000648&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9172,-77.0259&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9249,-77.0222&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8743,-77.0057&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9154,-77.0446&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9155,-77.0222&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.853531,-77.053509&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8763,-77.0037&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9101,-77.0444&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9057,-77.0056&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90534,-77.046774&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90276,-77.03863&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.899408,-77.015289&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8851,-77.0023&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8803,-76.9862&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.884,-76.9861&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9121,-77.0387&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.944551,-77.063896&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9126,-77.0135&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8792,-76.9953&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.938736,-77.087171&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.934881,-77.072755&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.947774,-77.032818&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.865784,-76.9784&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.873057,-76.971015&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.886952,-76.996806&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.899632,-77.031686&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.894,-76.947974&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.887237,-77.028226&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.902221,-77.059219&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.933668,-76.991016&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.894919,-77.046587&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.886266,-77.022241&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.893028,-77.026013&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.897293,-77.05557&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.884,-76.995397&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.904742,-77.041606&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.947607,-77.079382&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9003,-76.9882&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.897222,-77.019347&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8991,-77.0337&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.905737,-77.02227&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.927872,-77.043358&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.903407,-77.043648&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90375,-77.06269&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.87675,-77.02127&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.894758,-76.997114&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.916442,-77.0682&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.900283,-77.029822&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8997,-77.023086&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.932514,-76.992889&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.910972,-77.00495&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.899972,-76.998347&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.900412,-77.001949&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.900413,-76.982872&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.889955,-77.000349&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.890461,-76.988355&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8692,-76.9599&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.894832,-76.987633&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.91554,-77.03818&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.898364,-77.027869&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.894514,-77.031617&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.897351,-77.022465&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.902061,-77.038322&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.908905,-77.04478&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.895344,-77.016106&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8923,-77.0436&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90774,-77.071652&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.899983,-76.991383&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.903827,-77.053485&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.89696,-77.00493&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.897446,-77.009888&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.905607,-77.027137&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.922581,-77.070334&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.898069,-77.031823&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.844015,-77.050537&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.88412,-77.04657&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.897857,-77.026975&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.896104,-77.049882&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.897315,-77.070993&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8946,-77.072305&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.893438,-77.076389&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.891696,-77.0846&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.892164,-77.079375&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.86559,-76.952103&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.892459,-77.046567&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.889,-77.0925&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.920669,-77.04368&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.906602,-77.038785&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9059,-77.0325&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8904,-77.0889&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8881,-77.09308&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.88786,-77.094875&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.89968,-77.041539&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.887299,-77.018939&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.88412,-77.017445&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.881185,-77.001828&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.912719,-77.022155&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9418,-77.0251&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.915417,-77.012289&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.928156,-77.02344&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.917761,-77.04062&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.889935,-76.93723&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.930282,-77.055599&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.896544,-76.96012&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.905126,-77.056887&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.88732,-76.983569&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.844711,-76.987823&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.876393,-77.107735&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8815,-77.10396&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.896923,-77.086502&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9024,-77.02622&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.885801,-77.097745&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.896015,-77.078107&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.87861,-77.006004&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.922649,-77.077271&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.928743,-77.012457&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.882788,-77.103148&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.88397,-77.10783&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.884734,-77.093485&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.888553,-77.032429&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.888767,-77.02858&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.87887,-77.1207&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.894573,-77.01994&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.893241,-77.086045&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.89593,-77.089006&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.89054,-77.08095&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.880834,-77.091129&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.881044,-77.111768&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.882629,-77.109366&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.879819,-77.037413&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.866611,-76.985238&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.883921,-77.116817&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.880012,-77.107854&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.884616,-77.10108&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90509,-76.9941&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.903584,-77.044789&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.903819,-77.0284&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.901539,-77.046564&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.902204,-77.04337&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.803124,-77.040363&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.804718,-77.043363&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.810743,-77.044664&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.805317,-77.049883&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.902,-77.03353&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.805648,-77.05293&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.811456,-77.050276&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.814577,-77.052808&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.805767,-77.06072&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.9066,-77.05152&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.895914,-77.026064&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90088,-77.048911&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.883669,-77.113905&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.884961,-77.08777&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.889365,-77.077294&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.888251,-77.049426&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.894722,-77.045128&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90093,-77.018677&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.943837,-77.077078&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.954812,-77.082426&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.905707,-77.003041&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.863833,-77.080319&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.857803,-77.086733&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.846222,-77.069275&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.918809,-77.041571&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.956595,-77.019815&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.949662,-77.027333&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.942016,-77.032652&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.956432,-77.032947&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.847977,-77.075104&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.8444,-77.085931&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.860789,-77.09586&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.84736,-77.095431&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.84232,-77.089555&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.848454,-77.084918&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.987,-77.029417&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90849,-77.063586&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.854691,-77.100555&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.955016,-77.069956&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.908142,-77.038359&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.92333,-77.0352&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.889988,-76.995193&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.912659,-77.017669&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.889908,-76.983326&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.897274,-76.994749&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.927095,-76.978924&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.897195,-76.983575&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.928644,-76.990955&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.843222,-76.999388&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.863897,-76.990037&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.903732,-76.987211&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.916787,-77.028139&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.852248,-77.105022&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.834108,-77.087323&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.880705,-77.08596&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.867262,-77.072315&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.887378,-77.001955&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.894851,-77.02324&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.917622,-77.01597&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.918155,-77.004746&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.892275,-77.013917&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.886978,-77.013769&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.887312,-77.025762&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.927497,-76.997194&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.862478,-77.086599&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.856319,-77.11153&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.871822,-77.107906&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.985404,-77.023082&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.981103,-77.097426&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.983838,-77.09221&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.096312,-77.192672&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.093783,-77.202501&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.988562,-77.096539&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.084125,-77.151291&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.98954,-77.098029&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.094772,-77.145213&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.98128,-77.011336&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.983627,-77.006311&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.99521,-77.02918&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.983525,-77.095367&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.961763,-77.085998&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.921074,-77.031887&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.87501,-77.0024&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.920387,-77.025672&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.876737,-76.994468&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.120045,-77.156985&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.099376,-77.188014&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.082779,-77.148827&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.123513,-77.15741&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.990249,-77.02935&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.107709,-77.152072&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.982456,-77.091991&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.000578,-77.00149&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.095661,-77.159048&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.989724,-77.023854&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.975,-77.01121&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.977933,-77.006472&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.992375,-77.100104&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.990639,-77.100239&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.977093,-77.094589&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.102099,-77.200322&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.094103,-77.132954&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.076331,-77.141378&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.102212,-77.177091&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.992679,-77.029457&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.999388,-77.031555&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.997033,-77.025608&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.114688,-77.171487&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.110314,-77.182669&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.923583,-77.050046&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90706,-77.015231&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.898536,-76.931862&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.908473,-76.933099&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.878433,-77.03023&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.873755,-77.089233&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.86646,-77.04826&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.999634,-77.109647&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.96115,-77.088659&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.899703,-77.008911&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.837666,-77.09482&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.873219,-77.082104&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.869418,-77.095596&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.941154,-77.062036&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.103091,-77.196442&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.097636,-77.196636&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.997445,-77.023894&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.085394,-77.145803&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.900358,-77.012108&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.952369,-77.002721&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.975219,-77.016855&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.920682,-76.995876&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.866471,-77.076131&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.839912,-77.087083&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.119765,-77.166093&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.964992,-77.103381&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.084379,-77.146866&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.984691,-77.094537&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.88992,-77.071301&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.903295,-77.065884&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.804378,-77.060866&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.894941,-77.09169&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.869442,-77.104503&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.898404,-77.024281&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.897612,-77.080851&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.901755,-77.051084&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.801111,-77.068952&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.82175,-77.047494&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.802677,-77.063562&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.820064,-77.057619&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.82595,-77.058541&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.820932,-77.053096&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.833077,-77.059821&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.890612,-77.084801&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.986743,-77.000035&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.864702,-77.048672&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.96497,-77.075946&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90366,-77.034846&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.89841,-77.039624&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.997653,-77.034499&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.912648,-77.041834&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.859254,-77.063275&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.908008,-76.996985&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.896456,-77.104562&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.898984,-77.078317&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.895377,-77.09713&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.898412,-77.043182&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.876528,-77.12712&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.862467,-77.068242&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.913761,-77.027025&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.86612,-77.08787&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.89967,-77.003666&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90864,-77.02277&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.961339,-77.027855&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.894474,-76.974828&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.896134,-76.9929&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.915604,-76.983683&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.946182,-77.08059&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.994113,-77.076986&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=39.105295,-77.194774&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.898301,-77.118009&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.812718,-77.044097&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.80704,-77.059817&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.813485,-77.049468&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.999378,-77.097882&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.884829,-77.127671&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.958267,-77.084636&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.928893,-77.03625&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.912652,-77.036278&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.999679,-77.051168&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.979875,-77.093522&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.903658,-77.031737&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.908643,-77.012365&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.831516,-77.008133&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.893511,-77.041544&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.909394,-77.048728&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.995681,-77.038721&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.799267,-77.0447&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.925284,-77.032375&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.822738,-77.049265&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90843,-77.02714&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.884377,-77.025791&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.947156,-77.065115&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.894972,-77.003135&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.828437,-77.086031&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.884859,-77.155988&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.880992,-77.135271&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.812711,-77.061715&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.820058,-77.062821&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.890493,-77.017253&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.890544,-77.049379&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.888097,-77.038325&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.798133,-77.0487&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.843422,-77.064016&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.903598,-77.01397&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.841291,-77.063093&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.797557,-77.053766&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.818748,-77.047783&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.829545,-77.047844&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.928552,-77.032224&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.90268,-77.035737&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.949813,-77.080217&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.91263,-76.971923&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.897407,-76.925907&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.870695,-76.982359&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.929261,-77.240654&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.92403,-77.235955&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.923116,-77.232108&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.924437,-77.217664&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.931911,-77.219261&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.928919,-77.225394&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.932636,-77.231825&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.962524,-77.361902&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.962095,-77.358815&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.960574,-77.356324&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.955079,-77.351649&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.957037,-77.359718&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.884916,-77.005965&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.948363,-77.338119&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.959633,-77.358741&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.892441,-77.048947&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.891805,-76.913563&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.960084,-77.353414&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.955314,-77.368416&sensor=false
## .
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?latlng=38.923083,-77.227417&sensor=false
# Correct row 177
ad<-as.data.frame(ad)
levels(ad[,1])[levels(ad[,1])=="Fowler Hall, 800 Florida Ave NE, Washington, DC 20002, USA"] <- "Fowler Hall 800 Florida Ave NE, Washington, DC 20002, USA"
# Transform into dataframe and separate columns
ad1<-data.frame(do.call(rbind, str_split(ad[,1], ','))) # Separate columns in variables
ad2<-data.frame(do.call(rbind, str_split(ad1[,3], ' '))) # Separate columns in variables
## Warning in (function (..., deparse.level = 1) : number of columns of result
## is not a multiple of vector length (arg 127)
ad3 <- cbind(ad1[1],ad1[2],ad2[2], ad2[3], ad1[4]) # Join the variables
colnames(ad3) <- c("Address", "City", "State", "Zip", "Country") # Change variables names
gps2 <- cbind(gps,ad3) # Combine to gps the address, city, state and zip
colnames(gps2)[1] <- c("Start.station") # Change variable name to match master dataset
master_df <- merge(bike_weather,gps2,by="Start.station", all.x=TRUE)

After the merge, we noticed there were a few observations that we needed to spot check due to missing data. We inserted accurate information for the city variable.

master_df$City[master_df$Start.station=="Utah St & 11th St N "]<-"Arlington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "Utah St & 11th
## St N ", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="Veterans Pl & Pershing Dr "]<-"Silver Spring"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "Veterans Pl &
## Pershing Dr ", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="Washington Blvd & Walter Reed Dr "]<-"Arlington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "Washington
## Blvd & Walter Reed Dr ", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="8th & F St NW"]<-"Washington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "8th & F St
## NW", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="Lee Hwy & N Nelson St"]<-"Arlington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "Lee Hwy & N
## Nelson St", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="11th & K St NW"]<-"Washington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "11th & K St
## NW", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="20th & Bell St"]<-"Arlington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "20th & Bell
## St", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="23rd & E St NW "]<-"Washington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "23rd & E St NW
## ", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="34th & Water St NW"]<-"Washington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "34th & Water
## St NW", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="34th St & Minnesota Ave SE"]<-"Washington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "34th St &
## Minnesota Ave SE", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="Anacostia Ave & Benning Rd NE / River Terrace "]<-"Washington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "Anacostia Ave
## & Benning Rd NE / River Terrace ", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="Court House Metro / 15th & N Uhle St "]<-"Arlington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "Court House
## Metro / 15th & N Uhle St ", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="Fenton St & Ellsworth Dr "]<-"Silver Spring"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "Fenton St &
## Ellsworth Dr ", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="King St Metro"]<-"Alexandria"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "King St
## Metro", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="Lincoln Park / 13th & East Capitol St NE "]<-"Washington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "Lincoln Park /
## 13th & East Capitol St NE ", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="Montgomery Ave & Waverly St "]<-"Bethesda"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "Montgomery Ave
## & Waverly St ", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="N Adams St & Lee Hwy"]<-"Arlington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "N Adams St &
## Lee Hwy", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="N Quincy St & Wilson Blvd"]<-"Arlington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "N Quincy St &
## Wilson Blvd", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="S Abingdon St & 36th St S"]<-"Arlington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "S Abingdon St
## & 36th St S", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="N Nelson St & Lee Hwy"]<-"Arlington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "N Nelson St &
## Lee Hwy", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="Fenton St & New York Ave "]<-"Silver Spring"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "Fenton St &
## New York Ave ", : invalid factor level, NA generated
master_df$City[master_df$Start.station=="Alta Tech Office"]<-"Washington"
## Warning in `[<-.factor`(`*tmp*`, master_df$Start.station == "Alta Tech
## Office", : invalid factor level, NA generated
\section{Data Cleaning and Feature Engineering}

With all of the individual data sets combined into a singular master dataframe, we moved onto the data cleaning stage.

The primary purpose of the data cleaning stage was to make variable data type transformations and address any missing values so that the data would be easier to work with in the exploratory data analysis (EDA) and modeling stages.

The primary purpose of the feature engineering stage was to create new variables based on combinations or derivatives of existing variables. Our inutition was that these variables would make a large impact in the prediction models.

The following sequence of commentary and code describes the data cleaning and feature engineering that was conducted.

Changed format of date variables using lubridate package.

master_df$date<- ymd(master_df$date)
master_df$Start.date<- ymd_hms(master_df$Start.date)
master_df$End.date<- ymd_hms(master_df$End.date)
head(master_df$date)
## [1] "2015-11-02" "2015-05-27" "2015-06-23" "2015-05-19" "2015-11-13"
## [6] "2015-08-29"
head(master_df$Start.date)
## [1] "2015-11-02 17:39:00 UTC" "2015-05-27 17:33:00 UTC"
## [3] "2015-06-23 18:09:00 UTC" "2015-05-19 19:00:00 UTC"
## [5] "2015-11-13 11:47:00 UTC" "2015-08-29 14:19:00 UTC"
head(master_df$End.date)
## [1] "2015-11-02 18:07:00 UTC" "2015-05-27 17:42:00 UTC"
## [3] "2015-06-23 18:15:00 UTC" "2015-05-19 19:02:00 UTC"
## [5] "2015-11-13 11:50:00 UTC" "2015-08-29 14:50:00 UTC"

Changed precipitation variable to type numeric.

master_df$PrecipitationIn<- as.numeric(as.character(master_df$PrecipitationIn))
## Warning: NAs introduced by coercion

Created a new hour variable.

master_df$hour<- hour(master_df$Start.date)

Added a binary weekday variable.

master_df$weekday<- weekdays(master_df$date)

Created a binary weekend variable.

master_df$weekend<- ifelse(master_df$weekday=="Saturday"|master_df$weekday=="Sunday",1,0)

Created a binary rush hour variable.

master_df$rushhour<- ifelse(master_df$hour<=9 & master_df$hour>=7 | master_df$hour<=19 & master_df$hour>=16,master_df$rushhour<-1,master_df$rushhour<-0)

Created a binary holiday variable.

master_df$holiday <- ifelse(master_df$date=='2015-01-01' | master_df$date=='2015-01-19' | master_df$date=='2015-02-16'|master_df$date=='2015-04-16'| master_df$date=='2015-04-17'| master_df$date=='2015-05-22'|master_df$date=='2015-05-25'| master_df$date=='2015-05-26'| master_df$date=='2015-07-02' | master_df$date=='2015-07-03' | master_df$date=='2015-07-06' | master_df$date=='2015-09-04' | master_df$date=='2015-09-07' | master_df$date=='2015-09-08' | master_df$date=='2015-10-12' | master_df$date=='2015-11-11' | master_df$date=='2015-11-26' | master_df$date=='2015-11-27' | master_df$date=='2015-12-24' | master_df$date=='2015-12-25' | master_df$date=='2015-12-31', master_df$holiday<-1,master_df$holiday<-0)

Combined the holiday and weekend variables for a new binary variable.

master_df$weekend_holiday<- ifelse(master_df$weekend==1 | master_df$holiday == 1,1,0)

Revalued subscription type, as certain datasets used “Member” and other datasets used “Registered” to signify a “subscriber”. Therefore, we combined these values.

master_df$Subscription.Type<-revalue(master_df$Subscription.Type,c('Member'='Registered'))
detach(package:plyr)

Created feels like temperature variable.

master_df$Mean.Humidity<-master_df$Mean.Humidity/100
master_df$feellike<-0.363445176+
  0.98862246*(master_df$Mean.TemperatureF)+
  4.777114035*(master_df$Mean.Humidity)+
  -0.114037667*(master_df$Mean.TemperatureF*master_df$Mean.Humidity)+
  -0.000850208*(master_df$Mean.TemperatureF^2)+
  -0.020716198*(master_df$Mean.Humidity^2)+
  0.000687678*((master_df$Mean.TemperatureF^2)*master_df$Mean.Humidity)+
  0.000274954*(master_df$Mean.TemperatureF*(master_df$Mean.Humidity^2))+
  0*((master_df$Mean.TemperatureF^2)*(master_df$Mean.Humidity^2))

Created season variable.

getSeason <- function(DATES) {
  WS <- as.Date("2012-12-15", format="%Y-%m-%d") # Winter Solstice
  SE <- as.Date("2012-03-15", format="%Y-%m-%d") # Spring Equinox
  SS <- as.Date("2012-06-15", format="%Y-%m-%d") # Summer Solstice
  FE <- as.Date("2012-09-15", format="%Y-%m-%d") # Fall Equinox
  d <- as.Date(strftime(DATES, format="2012-%m-%d"))
  ifelse (d>=WS|d<SE, "Winter", ifelse (d>=SE&d<SS,"Spring", ifelse(d>=SS&d<FE,"Summer","Fall")))
}
master_df$season<-getSeason(master_df$date)
master_df$season<-as.factor(master_df$season)
summary(master_df$season)
##    Fall  Spring  Summer  Winter 
##  799058  957814 1048759  387277

Created a variable for adverse weather.

master_df$AdverseWeather <- ifelse(master_df$Events == "","False","True")
master_df$AdverseWeather <- as.factor(master_df$AdverseWeather)
summary(master_df$AdverseWeather)
##   False    True 
## 1972462 1220446

Create a variable beautiful weather.

master_df$BeautifulWeather <- ifelse(master_df$Events == "" & master_df$Mean.TemperatureF >= 50 & master_df$Mean.TemperatureF <= 85,"True","False")
master_df$BeautifulWeather <- as.factor(master_df$BeautifulWeather)
summary(master_df$BeautifulWeather)
##   False    True 
## 1578648 1614260

Remove the whitespace from city variable.

levels(master_df$City)
##  [1] " Alexandria"    " Arlington"     " Bethesda"      " Chevy Chase"  
##  [5] " Derwood"       " McLean"        " Potomac"       " Reston"       
##  [9] " Rockville"     " Silver Spring" " Takoma Park"   " Tysons"       
## [13] " Vienna"        " Washington"
levels(master_df$City)[levels(master_df$City)==" Washington"]<-"Washington"
levels(master_df$City)[levels(master_df$City)==" Alexandria"]<-"Alexandria"
levels(master_df$City)[levels(master_df$City)==" Arlington"]<-"Arlington"
levels(master_df$City)[levels(master_df$City)==" Bethesda"]<-"Bethesda"
levels(master_df$City)[levels(master_df$City)==" Chevy Chase"]<-"Chevy Chase"
levels(master_df$City)[levels(master_df$City)==" Derwood"]<-"Derwood"
levels(master_df$City)[levels(master_df$City)==" McLean"]<-"McLean"
levels(master_df$City)[levels(master_df$City)==" Potomac"]<-"Potomac"
levels(master_df$City)[levels(master_df$City)==" Reston"]<-"Reston"
levels(master_df$City)[levels(master_df$City)==" Rockville"]<-"Rockville"
levels(master_df$City)[levels(master_df$City)==" Silver Spring"]<-"Silver Spring"
levels(master_df$City)[levels(master_df$City)==" Takoma Park"]<-"Takoma Park"
levels(master_df$City)[levels(master_df$City)==" Tysons"]<-"Tysons"
levels(master_df$City)[levels(master_df$City)==" Vienna"]<-"Vienna"
levels(master_df$City)
##  [1] "Alexandria"    "Arlington"     "Bethesda"      "Chevy Chase"  
##  [5] "Derwood"       "McLean"        "Potomac"       "Reston"       
##  [9] "Rockville"     "Silver Spring" "Takoma Park"   "Tysons"       
## [13] "Vienna"        "Washington"

Created a month variable.

master_df$month<- month(master_df$date)

The precipitation variable was missing quite a few values. However, we knew that we could fill these missing values in with our best guess by using the event variable and the average precipitation by month.

If the weather event variable was blank, we knew it was a good weather day. On the contrary, if the weather event variable was not blank, we knew it was likely that rain was experienced. We therefore, filled in the missing value with the average precipitation for the respective month.

average_month_precipitation<-master_df%>%
  select(month,PrecipitationIn)%>%
  group_by(month)%>%
  summarise(mean(PrecipitationIn,na.rm=TRUE))
colnames(average_month_precipitation)[2]<-"avg_precipitation"
master_df <- merge(master_df,average_month_precipitation,by="month")
master_df$new_precipitation<-
  ifelse(is.na(master_df$PrecipitationIn) & !master_df$Events=="",
         master_df$new_precipitation<- master_df$avg_precipitation,
         ifelse(is.na(master_df$PrecipitationIn) & master_df$Events=="",master_df$new_precipitation<-0.000000,
                master_df$new_precipitation<- master_df$PrecipitationIn))
#Drop variable used for calculation
master_df$avg_precipitation<-NULL

Factorized weekday, weekend, holiday, rushhour, cloud cover, hour, and zip variables.

master_df$weekday<- as.factor(master_df$weekday)
master_df$weekday<- factor(master_df$weekday, levels = c("Monday", "Tuesday", "Wednesday","Thursday","Friday","Saturday","Sunday"))
master_df$weekend<- as.factor(master_df$weekend)
master_df$holiday<- as.factor(master_df$holiday)
master_df$rushhour<- as.factor(master_df$rushhour)
master_df$weekend_holiday<- as.factor(master_df$weekend_holiday)
master_df$CloudCover<- as.factor(master_df$CloudCover)
master_df$hour<- as.factor(master_df$hour)
master_df$Zip<- as.factor(master_df$Zip)

The final list of variables and the dimensions of the dataframe.

dim(master_df)
## [1] 3192908      49
names(master_df)
##  [1] "month"                     "Start.station"            
##  [3] "date"                      "Total.duration..ms."      
##  [5] "Start.date"                "End.date"                 
##  [7] "End.station"               "Bike.number"              
##  [9] "Subscription.Type"         "Max.TemperatureF"         
## [11] "Mean.TemperatureF"         "Min.TemperatureF"         
## [13] "Max.Dew.PointF"            "MeanDew.PointF"           
## [15] "Min.DewpointF"             "Max.Humidity"             
## [17] "Mean.Humidity"             "Min.Humidity"             
## [19] "Max.Sea.Level.PressureIn"  "Mean.Sea.Level.PressureIn"
## [21] "Min.Sea.Level.PressureIn"  "Max.VisibilityMiles"      
## [23] "Mean.VisibilityMiles"      "Min.VisibilityMiles"      
## [25] "Max.Wind.SpeedMPH"         "Mean.Wind.SpeedMPH"       
## [27] "Max.Gust.SpeedMPH"         "PrecipitationIn"          
## [29] "CloudCover"                "Events"                   
## [31] "WindDirDegrees"            "LATITUDE"                 
## [33] "LONGITUDE"                 "Address"                  
## [35] "City"                      "State"                    
## [37] "Zip"                       "Country"                  
## [39] "hour"                      "weekday"                  
## [41] "weekend"                   "rushhour"                 
## [43] "holiday"                   "weekend_holiday"          
## [45] "feellike"                  "season"                   
## [47] "AdverseWeather"            "BeautifulWeather"         
## [49] "new_precipitation"

Saved a copy of the master dataset.

write.csv(master_df,"master_df.csv")
\section{2.3 Exploratory Data Analysis}

The data exploration stage focused on visualizing the relationships between variables and exploring patterns within the dataset.

The following sequence of commentary and code showcases the EDA that was conducted.

# Fine tune master_df before creating EDAs
master_df %>%
  mutate(
    date = ymd(date),
    weekday = factor(weekday,
                     levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")),         season = factor(season, levels = c("Spring", "Summer", "Fall", "Winter")),
    hour = factor(hour, levels = 0:23),
    duration.min = round(Total.duration..ms. / 60000, digits = 2)
  ) %>%
  dplyr::select(Start.station, date, duration.min, End.station, Subscription.Type, CloudCover, Events, LATITUDE, LONGITUDE, Address, City, Zip, hour, weekday, weekend, rushhour, holiday, season, AdverseWeather, BeautifulWeather, weekend_holiday) -> map_df

The first thing we’ll do is to run distribution analysis on the main continuous variables in the dataset: total.rides and avg.duration. We will use levels of five categorical variables, i.e. Subscription.Type, weekend_holiday, rushhour, season, and AdverseWeather, as group coloring to generate high level between-group distribution comparison.

map_df %>%
  group_by(date, hour, Subscription.Type) %>%
  summarise(
    total.rides = n(),
    avg.duration = mean(duration.min), 
    weekend_holiday = first(weekend_holiday),
    weekday = first(weekday),
    rushhour = first(rushhour),
    season = first(season),
    AdverseWeather = first(AdverseWeather)
  ) -> day_hour_rides
# Create distribution histograms
g1 <- ggplot(data=day_hour_rides)
g1 + geom_histogram(mapping = aes(total.rides, fill = Subscription.Type), binwidth = 0.5) -> g2
g1 + geom_histogram(mapping = aes(log(day_hour_rides$avg.duration), fill = Subscription.Type), bins = 100) -> g3
g1 + geom_histogram(mapping = aes(total.rides, fill = weekend_holiday), binwidth = 0.5) -> g4
g1 + geom_histogram(mapping = aes(log(day_hour_rides$avg.duration), fill = weekend_holiday), bins = 100) -> g5
g1 + geom_histogram(mapping = aes(total.rides, fill = as.factor(rushhour)), binwidth = 0.5) -> g6
g1 + geom_histogram(mapping = aes(log(day_hour_rides$avg.duration), fill = as.factor(rushhour)), bins = 100) -> g7
g1 + geom_histogram(mapping = aes(total.rides, fill = as.factor(season)), binwidth = 0.5) -> g8
g1 + geom_histogram(mapping = aes(log(day_hour_rides$avg.duration), fill = as.factor(season)), bins = 100) -> g9
g1 + geom_histogram(mapping = aes(total.rides, fill = as.factor(AdverseWeather)), binwidth = 0.5) -> g10
g1 + geom_histogram(mapping = aes(log(day_hour_rides$avg.duration),
                                  fill = as.factor(AdverseWeather)), bins = 100) -> g11
# Display the histograms
plot_grid(g2, g3, nrow = 2, rel_widths = c(1/2, 1/2))

plot_grid(g4, g5, nrow = 2, rel_widths = c(1/2, 1/2))

plot_grid(g6, g7, nrow = 2, rel_widths = c(1/2, 1/2))

plot_grid(g8, g9, nrow = 2, rel_widths = c(1/2, 1/2))

plot_grid(g10, g11, nrow = 2, rel_widths = c(1/2, 1/2))

Our first impression is that the distribution of total.rides is skewing right, while the distribution of avg.duration has two modes.

More specifically, the avg.duration distribution by Subscription.Type graph indicates that registered bikers are contributing to the lower duration mode while the casual bikers are contrbution to the higher mode. Casual bikers have much less total.rides than the registered bikers. In the distribution by rushhour graph, commuting hour rides are dominating hours that have higher count of total.rides. Rushhour rides are also contributing more to the lower avg.duration mode. Another interesting finding from the distribution by season graph is that winter has much more short-duration rides than other seasons, while spring and summer have more long-duration rides among casual riders.

The above analysis indicates that time-related factors are having a strong impact on the dependent variables. In our next step, we will create heatmaps for hour of the day / day of the week to futher explore the patterns.

# Create a subset just for the time heatmap
day_hour_rides %>%
  ungroup() %>%
  select(hour, weekday, total.rides, avg.duration) %>%
  mutate(total_duration = total.rides * avg.duration, 
         hour = factor(hour, levels = (0:23))) %>%
  group_by(hour, weekday) %>%
  summarise(count.rides = sum(total.rides), total.duration = sum(total_duration)) -> df.1
# Create time based heatmaps
g10 <- ggplot(data=df.1, aes(x=hour, y=weekday, fill=count.rides)) +
  geom_tile(color="white", size=0.1)+ coord_equal() +
  labs(x=NULL, y=NULL, title="Count of Rides Per Weekday & Hour of Day") +
  theme_tufte(base_family="Calibri") + theme(plot.title=element_text(hjust=0.5, size = 10)) +
  theme(axis.ticks=element_blank()) + theme(axis.text=element_text(size=7)) + theme(legend.position="none") +
  scale_fill_gradient(low = "white", high = "steelblue")
g11 <- ggplot(data=df.1, aes(x=hour, y=weekday, fill=total.duration)) +
  geom_tile(color="white", size=0.1)+ coord_equal() +
  labs(x=NULL, y=NULL, title="Total Duration Per Weekday & Hour of Day") +
  theme_tufte(base_family="Calibri") + theme(plot.title=element_text(hjust=0.5, size = 10)) + theme(legend.position="none") +
  theme(axis.ticks=element_blank()) + theme(axis.text=element_text(size=7)) +
  scale_fill_gradient(low = "white", high = "firebrick")
g12 <- ggplot(data=df.1, aes(x=hour, y=weekday, fill=total.duration/count.rides)) +
  geom_tile(color="white", size=0.1)+ coord_equal() +
  labs(x=NULL, y=NULL, title="Average Duration Per Weekday & Hour of Day") +
  theme_tufte(base_family="Calibri") + theme(plot.title=element_text(hjust=0.5, size = 10)) + theme(legend.position="none") +
  theme(axis.ticks=element_blank()) + theme(axis.text=element_text(size=7)) +
  scale_fill_gradient(low = "white", high = "springgreen3")
plot_grid(g10, g12, nrow = 2, rel_heights = c(1/2, 1/2))

Here we find some interesting patterns from the hour-weekday heatmap. It seems that more rides have taken place during rush hours on work days, while total.rides distributes evenly in day time on weekend. The avg.duration of the rides appears to be longer during day time over the weekend.

After we have a general understanding of the data, we move on to explore the geospatial distribution of total.rides across the DC metro area. First let us plot the bike stations.

# Create station list with coordinates, total count of rides, and total duration of rides
map.stations <- map_df %>%
  group_by(Start.station) %>%
  summarise(total.rides = n(),
            avg.duration = mean(duration.min),
            subscriber.percentage = mean(Subscription.Type == "Registered"),
            lat = first(LATITUDE),
            lon = first(LONGITUDE)
            )
head(map.stations)
## # A tibble: 6 × 6
##                   Start.station total.rides avg.duration
##                          <fctr>       <int>        <dbl>
## 1                10th & E St NW       13611     25.61816
## 2         10th & Florida Ave NW        8316     12.16949
## 3           10th & Monroe St NE        3916     15.95705
## 4                10th & U St NW       13463     12.52403
## 5 10th St & Constitution Ave NW       19128     28.62108
## 6                11th & F St NW       13898     20.94567
## # ... with 3 more variables: subscriber.percentage <dbl>, lat <dbl>,
## #   lon <dbl>
# Plotly not working, skip
g <- list(
  scope = 'usa',
  projection = list(type = 'albers usa'),
  showland = TRUE,
  landcolor = toRGB("gray85"),
  subunitwidth = 1,
  countrywidth = 1,
  subunitcolor = toRGB("white"),
  countrycolor = toRGB("white")
)

p <- plot_geo(map.stations, locationmode = 'city', sizes = c(1, 250)) %>%
  add_markers(
    x = ~lon, y = ~lat, size = ~total.rides, color = ~avg.duration, hoverinfo = "text",
    text = ~paste(map.stations$Start.station, "<br />",
                  "Total Rides: ", map.stations$total.rides, "<br />", 
                  "Average Duration: ", map.stations$avg.duration, " mins",
                  "Percentage of Subscribers: ", map.stations$subscriber.percentage)
  ) %>%
  layout(title = '2015 Capital Bike Share Stations', geo = g)

Below we can see the locations of all the bike share stations across the DMV area, with the circle size representing total.rides and color representing avg.rides. It appears that bike stations are spreading out well in the DMV area, with stations located in DMV ourskirts such as Alexandria, VA, Bethesda, MD, and Silver Spring, MD.

# download basic map layers for plotting
base.map <- qmap("Wasington DC", zoom = 12, source= "google", maptype="roadmap", color = "bw", crop=FALSE, legend='topleft')
base.map.1 <- qmap("Wasington DC", zoom = 13, source= "google", maptype="roadmap", color = "bw", crop=FALSE, legend='topleft')
base.map.2 <- qmap("Wasington DC", zoom = 14, source= "google", maptype="roadmap", color = "bw", crop=FALSE, legend='topleft')
base.map + geom_point(aes(x = lon, y = lat, size=total.rides, color=avg.duration), data = map.stations,
 alpha = .5)+ scale_size(range = c(1, 5)) + scale_colour_gradient(low = "steelblue", high = "springgreen")

base.map.1 + geom_point(aes(x = lon, y = lat, size=total.rides, color=avg.duration), data = map.stations,
 alpha = .5) + scale_size(range = c(1, 5)) + scale_colour_gradient(low = "steelblue", high = "springgreen")

base.map.2 + geom_point(aes(x = lon, y = lat, size=total.rides, color=avg.duration), data = map.stations,
 alpha = .5) + scale_size(range = c(1, 10)) + scale_colour_gradient(low = "steelblue", high = "springgreen")

But how does the actual count of total.rides distribute across the area? Will it go in line with the bike station locations? We then move on to create a heatmap based on the density of total.rides on the map. The graph below indicates that total.rides are way more condensed than the distribution of the bike stations, with the most rides happening in the DC heart area, such as Dupont Circle, Logan Circle, National Mall, Metro Center, Gallery Place, World Bank, and Lincoln Memorial.

# Create a ride data set with location and ride, will also keep sliceability with other factors
# Adjust factor level names for better display in faceted visuals
map_df %>%
  mutate(lon = LONGITUDE, lat = LATITUDE) %>%
  select(Subscription.Type, Events, lat, lon, hour,
         weekday, weekend, rushhour, holiday, season, AdverseWeather, BeautifulWeather, weekend_holiday) %>%
  mutate(
    hour = as.numeric(hour),
    AdverseWeather = as.factor(if_else(AdverseWeather=="True", "Adverse: Yes", "Adverse: No")),
    BeautifulWeather = as.factor(if_else(BeautifulWeather == "True", "Beautiful: Yes", "Beautiful: No")),
    holiday = as.factor(if_else(holiday == "1", "Holiday: Yes", "Holiday: No")),
    weekend = as.factor(if_else(weekend == "1", "Weekend: Yes", "Weekend: No")),
    rushhour = as.factor(if_else(rushhour == "1", "Rush Hour: Yes", "Rush Hour: No")),
    weekend_holiday = as.factor(if_else(weekend_holiday == "1", "Leisure Day: Yes", "Leisure Day: No")),
    time_of_day = factor(if_else(hour>4 & hour < 13, "Morning",
                                    if_else(hour>12 & hour < 19, "Afternoon", 
                                            if_else(hour >16 & hour <= 23, "Night", "Late Night"))),
                            levels = c("Morning", "Afternoon", "Night", "Late Night")),
    hour = factor(hour, levels = 0:23)) -> ride_df
# Create ride density maps
base.map + geom_density2d(data = ride_df[sample(1:nrow(ride_df), 10000),], 
    aes(x = lon, y = lat), size = 0.4) + stat_density2d(data = ride_df[sample(1:nrow(ride_df), 10000),], 
    aes(x = lon, y = lat, fill = ..level.., alpha = ..level..), size = 1, 
    bins = 5, geom = "polygon", contour = TRUE) + scale_fill_gradient(low = "springgreen", high = "red") + 
    scale_alpha(range = c(0, 0.3), guide = FALSE)

base.map.1 + geom_density2d(data = ride_df[sample(1:nrow(ride_df), 10000),], 
    aes(x = lon, y = lat), size = 0.4) + stat_density2d(data = ride_df[sample(1:nrow(ride_df), 10000),], 
    aes(x = lon, y = lat, fill = ..level.., alpha = ..level..), size = 2, 
    bins = 8, geom = "polygon", contour = TRUE) + scale_fill_gradient(low = "springgreen", high = "red") + 
    scale_alpha(range = c(0, 0.3), guide = FALSE)

base.map.2 + geom_density2d(data = ride_df[sample(1:nrow(ride_df), 10000),], 
    aes(x = lon, y = lat), size = 0.5) + stat_density2d(data = ride_df[sample(1:nrow(ride_df), 10000),], 
    aes(x = lon, y = lat, fill = ..level.., alpha = ..level..), size = 3, 
    bins = 15, geom = "polygon", contour = TRUE) + scale_fill_gradient(low = "springgreen", high = "red") + 
    scale_alpha(range = c(0, 0.3), guide = FALSE)

Since we now have a general idea of where the most rides are happening in DC, our next step is to slice the ridership data with factors we generated from time and weather and compare the patterns. We wanted to see if the popularity of the stations changed under different time and weather conditions.

# Create a subsliced ridership set of 15000 observations
ride_df.sample <- ride_df[sample(1:nrow(ride_df), 15000),]
# Ride frequency heatmap by seasons
dc.map.3 + stat_density2d(aes(x=lon, y=lat, fill=..level.., alpha=..level..),
                          bins=7, geom="polygon", data=ride_df.sample) +
  scale_fill_gradient(low="springgreen", high="tomato") + scale_alpha(range = c(0.1, 0.6), guide = FALSE) + 
  facet_wrap(~season, nrow = 1) +
  guides(fill=guide_legend(title="ride\nfrequency")) +
  ggtitle("Ride Distribution by Seasons") +
  theme(axis.title=element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank(),
        legend.text = element_blank(),
        plot.title = element_text(color="black", size=16, hjust=0)) 

The first graph shows the distribution of rides in each season of the year of 2015. In Spring and Summer, both Lincoln Memorial and National Mall enjoy more rides from other time of the year. During winter, however, it seems that more people are taking bike rides around Logan Circle, Foggy Bottom, and Metro Center, i.e. the inner center of the District.

# Ride frequency heatmap by time of day
dc.map.3 + stat_density2d(aes(x=lon, y=lat, fill=..level.., alpha=..level..),
                          bins=7, geom="polygon", data=ride_df.sample) +
  scale_fill_gradient(low="springgreen", high="tomato") + scale_alpha(range = c(0.1, 0.6), guide = FALSE) + 
  facet_wrap(~time_of_day, nrow = 1) +
  guides(fill=guide_legend(title="ride\nfrequency")) +
  ggtitle("Ride Distribution by Time of Day") +
  theme(axis.title=element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank(),
        legend.text = element_blank(),
        plot.title = element_text(color="black", size=16, hjust=0)) 

Another similar comparison based on time of the day shows that people are taking more rides in central to northeastern DC in the morning and more in central to southwestern DC in the afternoon. Bikers start their rides mostly around DuPont circle, Logan Circle, Metro Center, and Gallery Place at night. Few people will start their rides in late night, of course; but we are seeing relatively more rides in the central to northwestern DC area. It seems that people’s daily routine is contributing to this pattern, considering that these areas correspond to the residence area, working area, and entertaining/event area in DC.

# Ride frequency heatmap by rush hour
dc.map.3 + stat_density2d(aes(x=lon, y=lat, fill=..level.., alpha=..level..),
                          bins=7, geom="polygon", data=ride_df.sample) +
  scale_fill_gradient(low="springgreen", high="tomato") + scale_alpha(range = c(0.1, 0.6), guide = FALSE) + 
  facet_wrap(~rushhour) +
  guides(fill=guide_legend(title="ride\nfrequency")) +
  ggtitle("Ride Distribution - Rush Hour?") +
  theme(axis.title=element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank(),
        legend.text = element_blank(),
        plot.title = element_text(color="black", size=16, hjust=0))

Since time is creating interesting impact on total.rides and bikes can be a useful tool for commuting, we want to check out specifically the allocation of rides for rush hours againt other time of the day. In the above graph, we notice that more people are taking bike rides near Metro Center, Gallery Place, and Capital Hill during rush hours, while more people are taking rides near Lincoln Memorial and National Mall during non-rush hours. This information is interesting, since Metro center, Gallery place, and Capital Hill are places where many people go to work, while (apparently) Lincoln Memorial and National Mall are popular tourist sites.

# Ride frequency heatmap by weekend/holiday
dc.map.3 + stat_density2d(aes(x=lon, y=lat, fill=..level.., alpha=..level..),
                          bins=7, geom="polygon", data=ride_df.sample) +
  scale_fill_gradient(low="springgreen", high="tomato") + scale_alpha(range = c(0.1, 0.6), guide = FALSE) + 
  facet_wrap(~weekend_holiday + BeautifulWeather, nrow = 1) +
  guides(fill=guide_legend(title="ride\nfrequency")) +
  ggtitle("Ride Distribution - Leisure Days X Good Weather") +
  theme(axis.title=element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank(),
        legend.text = element_blank(),
        plot.title = element_text(color="black", size=16, hjust=0))

Since Lincoln Memorial and National Mall are enjoying much love in non-rush hours, we are interested to check out if leisure time will have a different pattern for total.rides distribution. Comparing the left two graphs in the above chart, it is apparent that the distribution of ridership is sparse for leisure days in good weather: riders are of course starting their rides from many different stations across the District. Interestingly, the second left graph shows that bikers mostly still ride in the central DC during working days despite the good weather. Commuting really seems to be a major function of the shared bikes!

Since commuting seems to be a really big factor for the distribution of rides, we are insterested to dig a bit deeper into the type of subscription for each ride. Since bike share subscribers are more likely to use bikes for commute, will we see a clear difference between casual and registered bikers?

# Ride frequency heatmap by Subscription Type
dc.map.2 + stat_density2d(aes(x=lon, y=lat, fill=..level.., alpha=..level..),
                          bins=7, geom="polygon", data=ride_df.sample) +
  scale_fill_gradient(low="springgreen", high="tomato") + scale_alpha(range = c(0.1, 0.6), guide = FALSE) + 
  facet_wrap(~Subscription.Type + rushhour, nrow = 1) +
  guides(fill=guide_legend(title="ride\nfrequency")) +
  ggtitle("Ride Distribution by Subscription Type & Rush Hour") +
  theme(axis.title=element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank(),
        legend.text = element_blank(),
        plot.title = element_text(color="black", size=16, hjust=0))

The above graph shows that casual bikers are (apparently) taking more rides around the tourist attraction sites in DC, no matter if it’s in rush hour or not. For the subscribers, however, the distribution of rides are surprisingly even no matter it’s rush hour or not. If we really consider the nature of commuting, this actually makes sense: for people that ride bikes based on their daily commuting needs, they will need to use bikes to get to work or go home. The green area in the right two graphs actually shows the routine start stations for the registered users!

# Ride frequency heatmap by adverse weather
dc.map.2 + stat_density2d(aes(x=lon, y=lat, fill=..level.., alpha=..level..),
                          bins=7, geom="polygon", data=ride_df.sample) +
  scale_fill_gradient(low="springgreen", high="tomato") + scale_alpha(range = c(0.1, 0.6), guide = FALSE) + 
  facet_wrap(~AdverseWeather) +
  guides(fill=guide_legend(title="ride\nfrequency")) +
  ggtitle("Ride Distribution - Adverse Weather?") +
  theme(axis.title=element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank(),
        legend.text = element_blank(),
        plot.title = element_text(color="black", size=16, hjust=0))

A quick comparison of adverse weather against non-adverse weather shows not much difference for the ridership. This might be due to the nature of our integrated weather data: the weather information is the mean values for a whole day, thus making it hard for the slicers to differentiate ridership distribution on a lower grain level.

# Ride frequency heatmap by rush hour and adverse weather
dc.map.3 + stat_density2d(aes(x=lon, y=lat, fill=..level.., alpha=..level..),
                          bins=7, geom="polygon", data=ride_df.sample) +
  scale_fill_gradient(low="springgreen", high="tomato") + scale_alpha(range = c(0.1, 0.6), guide = FALSE) + 
  facet_wrap(~AdverseWeather + rushhour, nrow = 1) +
  guides(fill=guide_legend(title="ride\nfrequency")) +
  ggtitle("Ride Distribution - Bad Weather X Rush Hour") +
  theme(axis.title=element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank(),
        legend.text = element_blank(),
        plot.title = element_text(color="black", size=16, hjust=0))

Again, in the graph shown above here, we observe a bigger differece from Rush Hour than the weather. This seems to be related to the same challenge we are having from the weather variables.

\section{3. Modeling}

In total, three different models were tested: Multiple Linear Regression, Regression Tree and Random Forest. The prediction performance of the models was assessed based on the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).

Before creating the models, the final dataset was grouped by day and by hour since this was the level of ridership that we wanted to predicted. We also dropped unnecessary variables that would not be used for modeling. Lastly, we split the final data set into test and train dataframes based on a 70/30 random sampling split.

model_df <- master_df %>% 
        group_by(date,hour,Subscription.Type,Mean.TemperatureF,MeanDew.PointF,Mean.Humidity,Mean.Sea.Level.PressureIn,Mean.VisibilityMiles,Mean.Wind.SpeedMPH,new_precipitation,CloudCover,Events,City,weekday,weekend,rushhour,weekend_holiday,feellike,season,AdverseWeather,BeautifulWeather) %>% 
        summarise(total_rides = length(date))

dim(model_df)
## [1] 67966    22
names(model_df)
##  [1] "date"                      "hour"                     
##  [3] "Subscription.Type"         "Mean.TemperatureF"        
##  [5] "MeanDew.PointF"            "Mean.Humidity"            
##  [7] "Mean.Sea.Level.PressureIn" "Mean.VisibilityMiles"     
##  [9] "Mean.Wind.SpeedMPH"        "new_precipitation"        
## [11] "CloudCover"                "Events"                   
## [13] "City"                      "weekday"                  
## [15] "weekend"                   "rushhour"                 
## [17] "weekend_holiday"           "feellike"                 
## [19] "season"                    "AdverseWeather"           
## [21] "BeautifulWeather"          "total_rides"
smp_size <- floor(0.7 * nrow(model_df))
set.seed(700)
train_ind <- sample(seq_len(nrow(model_df)), size = smp_size)
linearreg_train <- model_df[train_ind, ]
linearreg_test <- model_df[-train_ind, ]
\section{Model 1 - Multiple Linear Regression Model}

A collinearity test was conducted for the numeric variables.

num_vars <- c("Mean.TemperatureF","MeanDew.PointF","Mean.Humidity","Mean.Sea.Level.PressureIn","Mean.VisibilityMiles","Mean.Wind.SpeedMPH", "new_precipitation")

collinear_test_df <- linearreg_train[num_vars]

plot(collinear_test_df)

qplot(x=Var1, y=Var2, data = melt(cor(collinear_test_df)), fill=value, geom = "tile") + 
        labs(xlab = "Var1", ylab = "Var2") + 
        ggtitle("Correlation Coefficient Matrix")

Mean.TemperatureF and MeanDew.PointF showed a high correlation, so MeanDew.PointF was dropped. Date was also dropped, as including the variable would have lead to overfitting and also created a factor with far too many levels to be included in the model.

With the list of variables finalized, three modeling selection techniques were tested: Adjusted R Squared, AIC, and BIC. These techniques use different methods for penalizing the inclusion of each additional variable within the model, so we were interested to understand the impact this would have on each models prediction.

linearreg_train$MeanDew.PointF <- NULL
linearreg_train$date <- NULL
names(linearreg_train)
##  [1] "hour"                      "Subscription.Type"        
##  [3] "Mean.TemperatureF"         "Mean.Humidity"            
##  [5] "Mean.Sea.Level.PressureIn" "Mean.VisibilityMiles"     
##  [7] "Mean.Wind.SpeedMPH"        "new_precipitation"        
##  [9] "CloudCover"                "Events"                   
## [11] "City"                      "weekday"                  
## [13] "weekend"                   "rushhour"                 
## [15] "weekend_holiday"           "feellike"                 
## [17] "season"                    "AdverseWeather"           
## [19] "BeautifulWeather"          "total_rides"
m_full_linear <- lm(total_rides ~ ., data = na.omit(linearreg_train))
summary(m_full_linear)
## 
## Call:
## lm(formula = total_rides ~ ., data = na.omit(linearreg_train))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -227.91  -48.42  -14.57   30.25  914.53 
## 
## Coefficients: (3 not defined because of singularities)
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -420.2290    97.9117  -4.292 1.78e-05 ***
## hour1                         -28.1164     4.6378  -6.062 1.35e-09 ***
## hour2                         -40.1467     4.9679  -8.081 6.59e-16 ***
## hour3                         -71.0261     5.5359 -12.830  < 2e-16 ***
## hour4                         -66.6629     5.3610 -12.435  < 2e-16 ***
## hour5                          -4.7038     4.4733  -1.052 0.293025    
## hour6                          37.8042     4.0588   9.314  < 2e-16 ***
## hour7                          88.0880     3.8490  22.886  < 2e-16 ***
## hour8                         116.7852     3.7463  31.173  < 2e-16 ***
## hour9                          87.1247     3.7573  23.188  < 2e-16 ***
## hour10                         77.7497     3.7760  20.591  < 2e-16 ***
## hour11                         87.3201     3.7674  23.178  < 2e-16 ***
## hour12                         98.0514     3.7716  25.997  < 2e-16 ***
## hour13                         97.4776     3.7658  25.885  < 2e-16 ***
## hour14                         94.7998     3.7841  25.052  < 2e-16 ***
## hour15                        100.3481     3.7663  26.643  < 2e-16 ***
## hour16                        112.7898     3.7298  30.240  < 2e-16 ***
## hour17                        138.8497     3.7008  37.519  < 2e-16 ***
## hour18                        129.7897     3.7188  34.901  < 2e-16 ***
## hour19                        103.7854     3.7881  27.398  < 2e-16 ***
## hour20                         80.0564     3.8680  20.697  < 2e-16 ***
## hour21                         61.3715     3.9424  15.567  < 2e-16 ***
## hour22                         43.0206     4.0359  10.659  < 2e-16 ***
## hour23                         22.7684     4.1510   5.485 4.16e-08 ***
## Subscription.TypeRegistered    79.5777     1.0847  73.364  < 2e-16 ***
## Mean.TemperatureF             -21.6701     3.0046  -7.212 5.60e-13 ***
## Mean.Humidity                 -42.2389     7.5860  -5.568 2.59e-08 ***
## Mean.Sea.Level.PressureIn       3.4181     3.0947   1.105 0.269373    
## Mean.VisibilityMiles            3.0808     0.6550   4.703 2.57e-06 ***
## Mean.Wind.SpeedMPH             -0.4087     0.1781  -2.295 0.021727 *  
## new_precipitation             -10.9639     2.6156  -4.192 2.77e-05 ***
## CloudCover1                    -5.0851     4.5846  -1.109 0.267364    
## CloudCover2                    -2.4634     4.5040  -0.547 0.584419    
## CloudCover3                     2.3624     4.2291   0.559 0.576427    
## CloudCover4                     0.5633     4.3212   0.130 0.896284    
## CloudCover5                     2.4764     4.2147   0.588 0.556821    
## CloudCover6                     3.6226     4.2587   0.851 0.394973    
## CloudCover7                     1.4302     4.2217   0.339 0.734786    
## CloudCover8                    -4.4144     4.4680  -0.988 0.323163    
## EventsFog                       8.4081     5.9452   1.414 0.157290    
## EventsFog-Rain                  8.5135     4.1056   2.074 0.038118 *  
## EventsFog-Rain-Snow             9.0902     9.5589   0.951 0.341628    
## EventsFog-Rain-Thunderstorm    33.6352    10.7748   3.122 0.001800 ** 
## EventsFog-Snow                 -1.2149    12.8963  -0.094 0.924944    
## EventsRain                      1.5175     2.3670   0.641 0.521466    
## EventsRain-Hail-Thunderstorm   10.5665     7.9060   1.337 0.181391    
## EventsRain-Snow                -4.5116     4.3509  -1.037 0.299769    
## EventsRain-Thunderstorm         4.9787     3.1364   1.587 0.112430    
## EventsSnow                      1.8704     3.9680   0.471 0.637368    
## EventsThunderstorm             -1.1768     6.5843  -0.179 0.858158    
## CityArlington                  32.9890     1.6346  20.182  < 2e-16 ***
## CityBethesda                  -10.0073     1.8886  -5.299 1.17e-07 ***
## CityChevy Chase               -25.8668     3.1785  -8.138 4.14e-16 ***
## CityDerwood                   -52.9897     5.6771  -9.334  < 2e-16 ***
## CityRockville                 -24.6696     2.4590 -10.032  < 2e-16 ***
## CitySilver Spring             -26.1787     2.3017 -11.374  < 2e-16 ***
## CityTakoma Park               -21.1620     2.4000  -8.818  < 2e-16 ***
## CityWashington                206.5884     1.5924 129.736  < 2e-16 ***
## weekdayTuesday                 -1.0615     1.9311  -0.550 0.582537    
## weekdayWednesday                2.4662     1.9250   1.281 0.200151    
## weekdayThursday                 0.9640     1.9180   0.503 0.615254    
## weekdayFriday                   4.5937     1.9357   2.373 0.017643 *  
## weekdaySaturday                14.3572     2.7417   5.237 1.64e-07 ***
## weekdaySunday                   9.6747     2.7510   3.517 0.000437 ***
## weekend1                            NA         NA      NA       NA    
## rushhour1                           NA         NA      NA       NA    
## weekend_holiday1               -8.3739     2.2553  -3.713 0.000205 ***
## feellike                       25.9817     3.4620   7.505 6.27e-14 ***
## seasonSpring                    3.3573     1.7179   1.954 0.050675 .  
## seasonSummer                    0.6555     2.1158   0.310 0.756699    
## seasonWinter                  -15.4683     2.1380  -7.235 4.73e-13 ***
## AdverseWeatherTrue                  NA         NA      NA       NA    
## BeautifulWeatherTrue            6.6606     2.1143   3.150 0.001633 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 97.81 on 39920 degrees of freedom
## Multiple R-squared:  0.4771, Adjusted R-squared:  0.4762 
## F-statistic: 527.9 on 69 and 39920 DF,  p-value: < 2.2e-16
str(master_df)
## 'data.frame':    3192908 obs. of  49 variables:
##  $ month                    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Start.station            : Factor w/ 364 levels "10th & E St NW",..: 244 4 252 272 234 56 46 85 240 264 ...
##  $ date                     : Date, format: "2015-01-15" "2015-01-10" ...
##  $ Total.duration..ms.      : num  651115 359735 508262 804681 3882809 ...
##  $ Start.date               : POSIXct, format: "2015-01-15 19:44:00" "2015-01-10 18:58:00" ...
##  $ End.date                 : POSIXct, format: "2015-01-15 19:55:00" "2015-01-10 19:04:00" ...
##  $ End.station              : Factor w/ 364 levels "10th & E St NW",..: 348 150 299 91 234 36 48 240 26 342 ...
##  $ Bike.number              : Factor w/ 3582 levels "W00005","W00006",..: 2540 2931 2418 1252 2711 3169 1265 915 264 1650 ...
##  $ Subscription.Type        : Factor w/ 2 levels "Casual","Registered": 2 2 2 2 1 2 2 2 2 2 ...
##  $ Max.TemperatureF         : int  42 30 67 52 67 26 43 30 43 37 ...
##  $ Mean.TemperatureF        : int  37 25 55 44 55 19 33 23 37 34 ...
##  $ Min.TemperatureF         : int  32 19 42 36 42 12 23 15 31 30 ...
##  $ Max.Dew.PointF           : int  27 6 55 32 55 2 29 18 34 24 ...
##  $ MeanDew.PointF           : int  23 -1 45 29 45 -4 18 5 29 21 ...
##  $ Min.DewpointF            : int  19 -5 31 26 31 -8 3 -10 23 14 ...
##  $ Max.Humidity             : int  75 43 89 64 89 42 64 68 85 78 ...
##  $ Mean.Humidity            : num  0.6 0.32 0.68 0.53 0.68 0.36 0.47 0.49 0.67 0.65 ...
##  $ Min.Humidity             : int  45 21 46 41 46 29 29 30 49 51 ...
##  $ Max.Sea.Level.PressureIn : num  30.2 30.6 30.1 29.9 30.1 ...
##  $ Mean.Sea.Level.PressureIn: num  30.1 30.6 29.9 29.8 29.9 ...
##  $ Min.Sea.Level.PressureIn : num  30 30.4 29.7 29.7 29.7 ...
##  $ Max.VisibilityMiles      : int  10 10 10 10 10 10 10 10 10 10 ...
##  $ Mean.VisibilityMiles     : int  10 10 8 10 8 10 9 10 7 9 ...
##  $ Min.VisibilityMiles      : int  9 10 2 10 2 10 2 10 2 4 ...
##  $ Max.Wind.SpeedMPH        : int  12 20 26 15 26 24 31 30 13 22 ...
##  $ Mean.Wind.SpeedMPH       : int  6 8 13 6 13 11 16 15 5 15 ...
##  $ Max.Gust.SpeedMPH        : int  23 25 39 22 39 31 41 40 16 32 ...
##  $ PrecipitationIn          : num  0 0 0.2 0 0.2 0 NA NA 0.65 0.01 ...
##  $ CloudCover               : Factor w/ 9 levels "0","1","2","3",..: 6 1 9 8 9 4 6 6 8 7 ...
##  $ Events                   : Factor w/ 12 levels "","Fog","Fog-Rain",..: 1 1 7 1 7 1 11 11 9 11 ...
##  $ WindDirDegrees           : int  288 321 219 209 219 258 317 297 156 328 ...
##  $ LATITUDE                 : num  38.9 38.9 39 38.9 38.9 ...
##  $ LONGITUDE                : num  -77 -77 -77.1 -77 -77 ...
##  $ Address                  : Factor w/ 424 levels "1-3 Atlantic St SW",..: 151 163 296 115 263 141 118 197 46 112 ...
##  $ City                     : Factor w/ 14 levels "Alexandria","Arlington",..: 14 14 3 14 14 14 14 14 14 2 ...
##  $ State                    : Factor w/ 3 levels "DC","MD","VA": 1 1 2 1 1 1 1 1 1 3 ...
##  $ Zip                      : Factor w/ 60 levels "","20001","20002",..: 22 2 36 10 1 4 7 23 8 54 ...
##  $ Country                  : Factor w/ 1 level " USA": 1 1 1 1 1 1 1 1 1 1 ...
##  $ hour                     : Factor w/ 24 levels "0","1","2","3",..: 20 19 16 20 13 18 19 20 18 9 ...
##  $ weekday                  : Factor w/ 7 levels "Monday","Tuesday",..: 4 6 7 7 7 4 5 3 5 2 ...
##  $ weekend                  : Factor w/ 2 levels "0","1": 1 2 2 2 2 1 1 1 1 1 ...
##  $ rushhour                 : Factor w/ 2 levels "0","1": 2 2 1 2 1 2 2 2 2 2 ...
##  $ holiday                  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ weekend_holiday          : Factor w/ 2 levels "0","1": 1 2 2 2 2 1 1 1 1 1 ...
##  $ feellike                 : num  36.7 25.3 52.6 42.8 52.6 ...
##  $ season                   : Factor w/ 4 levels "Fall","Spring",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ AdverseWeather           : Factor w/ 2 levels "False","True": 1 1 2 1 2 1 2 2 2 2 ...
##  $ BeautifulWeather         : Factor w/ 2 levels "False","True": 1 1 1 1 1 1 1 1 1 1 ...
##  $ new_precipitation        : num  0 0 0.2 0 0.2 ...
anova(m_full_linear)
## Analysis of Variance Table
## 
## Response: total_rides
##                              Df    Sum Sq  Mean Sq   F value    Pr(>F)    
## hour                         23  19225180   835877   87.3814 < 2.2e-16 ***
## Subscription.Type             1   9255166  9255166  967.5212 < 2.2e-16 ***
## Mean.TemperatureF             1   3282030  3282030  343.0985 < 2.2e-16 ***
## Mean.Humidity                 1    585849   585849   61.2438 5.166e-15 ***
## Mean.Sea.Level.PressureIn     1     93292    93292    9.7526 0.0017920 ** 
## Mean.VisibilityMiles          1    185482   185482   19.3900 1.068e-05 ***
## Mean.Wind.SpeedMPH            1     37555    37555    3.9259 0.0475540 *  
## new_precipitation             1    107181   107181   11.2045 0.0008167 ***
## CloudCover                    8    112447    14056    1.4694 0.1625118    
## Events                       11    128281    11662    1.2191 0.2673989    
## City                          8 312264428 39033054 4080.4568 < 2.2e-16 ***
## weekday                       6    251930    41988    4.3894 0.0001933 ***
## weekend_holiday               1    156659   156659   16.3769 5.201e-05 ***
## feellike                      1   1897634  1897634  198.3758 < 2.2e-16 ***
## season                        3    769646   256549   26.8192 < 2.2e-16 ***
## BeautifulWeather              1     94930    94930    9.9238 0.0016327 ** 
## Residuals                 39920 381868889     9566                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
n = nrow(na.omit(linearreg_train))
stepAIC(na.omit(m_full_linear), k=log(n)) #BIC
## Start:  AIC=367218.2
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + 
##     Mean.Wind.SpeedMPH + new_precipitation + CloudCover + Events + 
##     City + weekday + weekend + rushhour + weekend_holiday + feellike + 
##     season + AdverseWeather + BeautifulWeather
## 
## 
## Step:  AIC=367218.2
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + 
##     Mean.Wind.SpeedMPH + new_precipitation + CloudCover + Events + 
##     City + weekday + weekend + rushhour + weekend_holiday + feellike + 
##     season + BeautifulWeather
## 
## 
## Step:  AIC=367218.2
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + 
##     Mean.Wind.SpeedMPH + new_precipitation + CloudCover + Events + 
##     City + weekday + weekend + weekend_holiday + feellike + season + 
##     BeautifulWeather
## 
## 
## Step:  AIC=367218.2
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + 
##     Mean.Wind.SpeedMPH + new_precipitation + CloudCover + Events + 
##     City + weekday + weekend_holiday + feellike + season + BeautifulWeather
## 
##                             Df Sum of Sq       RSS    AIC
## - Events                    11    192469 382061358 367122
## - CloudCover                 8    254980 382123869 367160
## - weekday                    6    351598 382220487 367191
## - Mean.Sea.Level.PressureIn  1     11670 381880560 367209
## - Mean.Wind.SpeedMPH         1     50392 381919282 367213
## - BeautifulWeather           1     94930 381963819 367218
## <none>                                   381868889 367218
## - weekend_holiday            1    131879 382000768 367221
## - new_precipitation          1    168076 382036966 367225
## - Mean.VisibilityMiles       1    211617 382080507 367230
## - Mean.Humidity              1    296564 382165454 367239
## - Mean.TemperatureF          1    497591 382366480 367260
## - season                     3    702779 382571668 367260
## - feellike                   1    538786 382407675 367264
## - Subscription.Type          1  51485750 433354639 372266
## - hour                      23  78275757 460144647 374431
## - City                       8 314377810 696246699 391153
## 
## Step:  AIC=367121.8
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + 
##     Mean.Wind.SpeedMPH + new_precipitation + CloudCover + City + 
##     weekday + weekend_holiday + feellike + season + BeautifulWeather
## 
##                             Df Sum of Sq       RSS    AIC
## - CloudCover                 8    271415 382332773 367065
## - weekday                    6    361679 382423037 367096
## - Mean.Sea.Level.PressureIn  1      9607 382070965 367112
## - Mean.Wind.SpeedMPH         1     69908 382131266 367119
## <none>                                   382061358 367122
## - weekend_holiday            1    119617 382180975 367124
## - new_precipitation          1    129748 382191106 367125
## - BeautifulWeather           1    141105 382202463 367126
## - Mean.VisibilityMiles       1    232143 382293501 367136
## - Mean.Humidity              1    299748 382361106 367143
## - season                     3    749999 382811357 367168
## - Mean.TemperatureF          1    563989 382625347 367170
## - feellike                   1    614706 382676064 367176
## - Subscription.Type          1  51452393 433513751 372164
## - hour                      23  78304274 460365632 374334
## - City                       8 314288739 696350097 391042
## 
## Step:  AIC=367065.5
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + 
##     Mean.Wind.SpeedMPH + new_precipitation + City + weekday + 
##     weekend_holiday + feellike + season + BeautifulWeather
## 
##                             Df Sum of Sq       RSS    AIC
## - weekday                    6    333120 382665894 367037
## - Mean.Sea.Level.PressureIn  1       510 382333284 367055
## <none>                                   382332773 367065
## - weekend_holiday            1    102626 382435399 367066
## - Mean.Wind.SpeedMPH         1    129272 382462046 367068
## - BeautifulWeather           1    155190 382487963 367071
## - new_precipitation          1    175125 382507898 367073
## - Mean.VisibilityMiles       1    271414 382604187 367083
## - Mean.Humidity              1    494402 382827175 367107
## - Mean.TemperatureF          1    529365 382862138 367110
## - feellike                   1    581574 382914348 367116
## - season                     3    806703 383139476 367118
## - Subscription.Type          1  51384468 433717241 372098
## - hour                      23  78276818 460609591 374270
## - City                       8 314077462 696410235 390961
## 
## Step:  AIC=367036.7
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + 
##     Mean.Wind.SpeedMPH + new_precipitation + City + weekend_holiday + 
##     feellike + season + BeautifulWeather
## 
##                             Df Sum of Sq       RSS    AIC
## - Mean.Sea.Level.PressureIn  1       286 382666180 367026
## - weekend_holiday            1     11393 382677287 367027
## <none>                                   382665894 367037
## - Mean.Wind.SpeedMPH         1    132531 382798424 367040
## - BeautifulWeather           1    144342 382810236 367041
## - new_precipitation          1    152719 382818613 367042
## - Mean.VisibilityMiles       1    250936 382916830 367052
## - Mean.TemperatureF          1    531313 383197207 367082
## - feellike                   1    583548 383249442 367087
## - Mean.Humidity              1    602672 383268565 367089
## - season                     3    858329 383524223 367095
## - Subscription.Type          1  51323322 433989216 372059
## - hour                      23  78187421 460853315 374228
## - City                       8 313954562 696620456 390909
## 
## Step:  AIC=367026.1
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + 
##     new_precipitation + City + weekend_holiday + feellike + season + 
##     BeautifulWeather
## 
##                        Df Sum of Sq       RSS    AIC
## - weekend_holiday       1     11985 382678166 367017
## <none>                              382666180 367026
## - Mean.Wind.SpeedMPH    1    145238 382811419 367031
## - BeautifulWeather      1    146404 382812585 367031
## - new_precipitation     1    153609 382819790 367032
## - Mean.VisibilityMiles  1    251761 382917941 367042
## - Mean.TemperatureF     1    534751 383200931 367071
## - feellike              1    587942 383254122 367077
## - Mean.Humidity         1    603853 383270034 367079
## - season                3    880246 383546426 367086
## - Subscription.Type     1  51326559 433992740 372049
## - hour                 23  78188439 460854619 374217
## - City                  8 313954275 696620456 390898
## 
## Step:  AIC=367016.8
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + 
##     new_precipitation + City + feellike + season + BeautifulWeather
## 
##                        Df Sum of Sq       RSS    AIC
## <none>                              382678166 367017
## - Mean.Wind.SpeedMPH    1    150591 382828756 367022
## - new_precipitation     1    153158 382831324 367022
## - BeautifulWeather      1    154392 382832557 367022
## - Mean.VisibilityMiles  1    256627 382934793 367033
## - Mean.TemperatureF     1    532026 383210192 367062
## - feellike              1    584923 383263089 367067
## - Mean.Humidity         1    598349 383276515 367069
## - season                3    881159 383559325 367077
## - Subscription.Type     1  51481685 434159851 372054
## - hour                 23  78209352 460887517 374210
## - City                  8 313962972 696641138 390889
## 
## Call:
## lm(formula = total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + 
##     new_precipitation + City + feellike + season + BeautifulWeather, 
##     data = na.omit(linearreg_train))
## 
## Coefficients:
##                 (Intercept)                        hour1  
##                   -299.5995                     -28.0048  
##                       hour2                        hour3  
##                    -39.9391                     -71.0196  
##                       hour4                        hour5  
##                    -66.7942                      -5.3131  
##                       hour6                        hour7  
##                     37.4923                      87.6110  
##                       hour8                        hour9  
##                    116.2905                      86.8799  
##                      hour10                       hour11  
##                     77.5900                      87.0558  
##                      hour12                       hour13  
##                     97.9970                      97.2500  
##                      hour14                       hour15  
##                     94.7294                     100.1506  
##                      hour16                       hour17  
##                    112.4641                     138.4482  
##                      hour18                       hour19  
##                    129.4917                     103.4653  
##                      hour20                       hour21  
##                     79.7041                      60.9151  
##                      hour22                       hour23  
##                     42.7005                      22.3699  
## Subscription.TypeRegistered            Mean.TemperatureF  
##                     79.3157                     -20.4128  
##               Mean.Humidity         Mean.VisibilityMiles  
##                    -51.4137                       2.6898  
##          Mean.Wind.SpeedMPH            new_precipitation  
##                     -0.6479                      -7.7999  
##               CityArlington                 CityBethesda  
##                     32.8642                     -10.0070  
##             CityChevy Chase                  CityDerwood  
##                    -25.6455                     -53.4749  
##               CityRockville            CitySilver Spring  
##                    -25.0290                     -26.2259  
##             CityTakoma Park               CityWashington  
##                    -21.3188                     206.3001  
##                    feellike                 seasonSpring  
##                     24.6193                       2.8306  
##                seasonSummer                 seasonWinter  
##                     -1.0607                     -16.2409  
##        BeautifulWeatherTrue  
##                      5.0319
stepAIC(na.omit(m_full_linear), k=2 )#AIC
## Start:  AIC=366616.5
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + 
##     Mean.Wind.SpeedMPH + new_precipitation + CloudCover + Events + 
##     City + weekday + weekend + rushhour + weekend_holiday + feellike + 
##     season + AdverseWeather + BeautifulWeather
## 
## 
## Step:  AIC=366616.5
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + 
##     Mean.Wind.SpeedMPH + new_precipitation + CloudCover + Events + 
##     City + weekday + weekend + rushhour + weekend_holiday + feellike + 
##     season + BeautifulWeather
## 
## 
## Step:  AIC=366616.5
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + 
##     Mean.Wind.SpeedMPH + new_precipitation + CloudCover + Events + 
##     City + weekday + weekend + weekend_holiday + feellike + season + 
##     BeautifulWeather
## 
## 
## Step:  AIC=366616.5
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + 
##     Mean.Wind.SpeedMPH + new_precipitation + CloudCover + Events + 
##     City + weekday + weekend_holiday + feellike + season + BeautifulWeather
## 
##                             Df Sum of Sq       RSS    AIC
## - Events                    11    192469 382061358 366615
## - Mean.Sea.Level.PressureIn  1     11670 381880560 366616
## <none>                                   381868889 366616
## - Mean.Wind.SpeedMPH         1     50392 381919282 366620
## - BeautifulWeather           1     94930 381963819 366624
## - CloudCover                 8    254980 382123869 366627
## - weekend_holiday            1    131879 382000768 366628
## - new_precipitation          1    168076 382036966 366632
## - Mean.VisibilityMiles       1    211617 382080507 366637
## - weekday                    6    351598 382220487 366641
## - Mean.Humidity              1    296564 382165454 366646
## - Mean.TemperatureF          1    497591 382366480 366667
## - feellike                   1    538786 382407675 366671
## - season                     3    702779 382571668 366684
## - Subscription.Type          1  51485750 433354639 371672
## - hour                      23  78275757 460144647 374027
## - City                       8 314377810 696246699 390620
## 
## Step:  AIC=366614.6
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + 
##     Mean.Wind.SpeedMPH + new_precipitation + CloudCover + City + 
##     weekday + weekend_holiday + feellike + season + BeautifulWeather
## 
##                             Df Sum of Sq       RSS    AIC
## - Mean.Sea.Level.PressureIn  1      9607 382070965 366614
## <none>                                   382061358 366615
## - Mean.Wind.SpeedMPH         1     69908 382131266 366620
## - weekend_holiday            1    119617 382180975 366625
## - new_precipitation          1    129748 382191106 366626
## - CloudCover                 8    271415 382332773 366627
## - BeautifulWeather           1    141105 382202463 366627
## - Mean.VisibilityMiles       1    232143 382293501 366637
## - weekday                    6    361679 382423037 366640
## - Mean.Humidity              1    299748 382361106 366644
## - Mean.TemperatureF          1    563989 382625347 366672
## - feellike                   1    614706 382676064 366677
## - season                     3    749999 382811357 366687
## - Subscription.Type          1  51452393 433513751 371665
## - hour                      23  78304274 460365632 374024
## - City                       8 314288739 696350097 390603
## 
## Step:  AIC=366613.6
## total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + 
##     new_precipitation + CloudCover + City + weekday + weekend_holiday + 
##     feellike + season + BeautifulWeather
## 
##                        Df Sum of Sq       RSS    AIC
## <none>                              382070965 366614
## - Mean.Wind.SpeedMPH    1     90856 382161821 366621
## - weekend_holiday       1    114682 382185646 366624
## - CloudCover            8    262319 382333284 366625
## - new_precipitation     1    134439 382205403 366626
## - BeautifulWeather      1    146779 382217744 366627
## - Mean.VisibilityMiles  1    236692 382307657 366636
## - weekday               6    359883 382430848 366639
## - Mean.Humidity         1    304016 382374981 366643
## - Mean.TemperatureF     1    554394 382625358 366670
## - feellike              1    605115 382676080 366675
## - season                3    787366 382858330 366690
## - Subscription.Type     1  51442821 433513785 371663
## - hour                 23  78296246 460367211 374023
## - City                  8 314280388 696351352 390602
## 
## Call:
## lm(formula = total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + 
##     new_precipitation + CloudCover + City + weekday + weekend_holiday + 
##     feellike + season + BeautifulWeather, data = na.omit(linearreg_train))
## 
## Coefficients:
##                 (Intercept)                        hour1  
##                   -312.7756                     -28.1794  
##                       hour2                        hour3  
##                    -40.1420                     -70.9311  
##                       hour4                        hour5  
##                    -66.7667                      -4.7441  
##                       hour6                        hour7  
##                     37.7392                      88.0400  
##                       hour8                        hour9  
##                    116.7162                      87.0766  
##                      hour10                       hour11  
##                     77.7444                      87.2682  
##                      hour12                       hour13  
##                     98.1012                      97.4979  
##                      hour14                       hour15  
##                     94.7901                     100.3806  
##                      hour16                       hour17  
##                    112.7349                     138.7788  
##                      hour18                       hour19  
##                    129.7587                     103.6887  
##                      hour20                       hour21  
##                     79.9594                      61.2126  
##                      hour22                       hour23  
##                     42.9181                      22.6577  
## Subscription.TypeRegistered            Mean.TemperatureF  
##                     79.5258                     -21.6258  
##               Mean.Humidity         Mean.VisibilityMiles  
##                    -42.0769                       2.6222  
##          Mean.Wind.SpeedMPH            new_precipitation  
##                     -0.5222                      -7.5003  
##                 CloudCover1                  CloudCover2  
##                     -5.1220                      -2.5793  
##                 CloudCover3                  CloudCover4  
##                      2.2890                       0.2518  
##                 CloudCover5                  CloudCover6  
##                      1.9416                       3.1763  
##                 CloudCover7                  CloudCover8  
##                      0.8579                      -4.7795  
##               CityArlington                 CityBethesda  
##                     32.9744                      -9.9547  
##             CityChevy Chase                  CityDerwood  
##                    -25.7636                     -52.9329  
##               CityRockville            CitySilver Spring  
##                    -24.7075                     -26.1301  
##             CityTakoma Park               CityWashington  
##                    -21.1539                     206.5479  
##              weekdayTuesday             weekdayWednesday  
##                     -0.9617                       3.0597  
##             weekdayThursday                weekdayFriday  
##                      1.8053                       5.1011  
##             weekdaySaturday                weekdaySunday  
##                     14.3692                       9.5128  
##            weekend_holiday1                     feellike  
##                     -7.6996                      25.9732  
##                seasonSpring                 seasonSummer  
##                      3.1008                      -0.4458  
##                seasonWinter         BeautifulWeatherTrue  
##                    -15.4146                       5.1726

Results of variable selection for each technique:

Adjusted R Squared: hour + Subscription.Type + Mean.TemperatureF + Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + new_precipitation + CloudCover + City + weekday + rushhour + weekend_holiday + feellike + season + BeautifulWeather

BIC: hour + Subscription.Type + Mean.TemperatureF + Mean.Humidity + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + new_precipitation + City + feellike + season + BeautifulWeather

AIC: hour + Subscription.Type + Mean.TemperatureF + Mean.Humidity + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + new_precipitation + CloudCover + City + weekday + weekend_holiday +feellike + season + BeautifulWeather

Based on these results, models were created for each technique.

m_full_linear <- lm(total_rides ~ hour + Subscription.Type + Mean.TemperatureF + Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + new_precipitation + CloudCover + City + weekday + rushhour + weekend_holiday + feellike + season + BeautifulWeather, data = na.omit(linearreg_train))
summary(m_full_linear)
## 
## Call:
## lm(formula = total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.Sea.Level.PressureIn + Mean.VisibilityMiles + 
##     Mean.Wind.SpeedMPH + new_precipitation + CloudCover + City + 
##     weekday + rushhour + weekend_holiday + feellike + season + 
##     BeautifulWeather, data = na.omit(linearreg_train))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -227.73  -48.37  -14.50   30.24  914.40 
## 
## Coefficients: (1 not defined because of singularities)
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -407.87203   96.35088  -4.233 2.31e-05 ***
## hour1                        -28.16655    4.63810  -6.073 1.27e-09 ***
## hour2                        -40.15203    4.96819  -8.082 6.56e-16 ***
## hour3                        -70.93993    5.53616 -12.814  < 2e-16 ***
## hour4                        -66.76924    5.36114 -12.454  < 2e-16 ***
## hour5                         -4.72478    4.47359  -1.056 0.290906    
## hour6                         37.74438    4.05901   9.299  < 2e-16 ***
## hour7                         88.06122    3.84926  22.877  < 2e-16 ***
## hour8                        116.72090    3.74663  31.154  < 2e-16 ***
## hour9                         87.08657    3.75745  23.177  < 2e-16 ***
## hour10                        77.74462    3.77619  20.588  < 2e-16 ***
## hour11                        87.27396    3.76751  23.165  < 2e-16 ***
## hour12                        98.10265    3.77188  26.009  < 2e-16 ***
## hour13                        97.50103    3.76604  25.890  < 2e-16 ***
## hour14                        94.79978    3.78427  25.051  < 2e-16 ***
## hour15                       100.40084    3.76648  26.656  < 2e-16 ***
## hour16                       112.75122    3.72985  30.229  < 2e-16 ***
## hour17                       138.79079    3.70094  37.501  < 2e-16 ***
## hour18                       129.77452    3.71909  34.894  < 2e-16 ***
## hour19                       103.71492    3.78834  27.377  < 2e-16 ***
## hour20                        79.97934    3.86823  20.676  < 2e-16 ***
## hour21                        61.22147    3.94239  15.529  < 2e-16 ***
## hour22                        42.92956    4.03620  10.636  < 2e-16 ***
## hour23                        22.65927    4.15104   5.459 4.82e-08 ***
## Subscription.TypeRegistered   79.53976    1.08466  73.332  < 2e-16 ***
## Mean.TemperatureF            -21.98741    2.86385  -7.678 1.66e-14 ***
## Mean.Humidity                -41.80755    7.46944  -5.597 2.19e-08 ***
## Mean.Sea.Level.PressureIn      3.04969    3.04357   1.002 0.316344    
## Mean.VisibilityMiles           2.59935    0.52771   4.926 8.44e-07 ***
## Mean.Wind.SpeedMPH            -0.47524    0.17582  -2.703 0.006874 ** 
## new_precipitation             -7.38128    2.00444  -3.682 0.000231 ***
## CloudCover1                   -5.10061    4.57809  -1.114 0.265228    
## CloudCover2                   -2.44969    4.50222  -0.544 0.586371    
## CloudCover3                    2.44002    4.22101   0.578 0.563224    
## CloudCover4                    0.47551    4.30998   0.110 0.912150    
## CloudCover5                    2.17559    4.19920   0.518 0.604394    
## CloudCover6                    3.45778    4.23727   0.816 0.414482    
## CloudCover7                    1.09245    4.18520   0.261 0.794073    
## CloudCover8                   -4.75207    4.38947  -1.083 0.278990    
## CityArlington                 32.96667    1.63465  20.167  < 2e-16 ***
## CityBethesda                  -9.97142    1.88864  -5.280 1.30e-07 ***
## CityChevy Chase              -25.77495    3.17858  -8.109 5.25e-16 ***
## CityDerwood                  -52.95412    5.67729  -9.327  < 2e-16 ***
## CityRockville                -24.72285    2.45899 -10.054  < 2e-16 ***
## CitySilver Spring            -26.13397    2.30150 -11.355  < 2e-16 ***
## CityTakoma Park              -21.15562    2.40001  -8.815  < 2e-16 ***
## CityWashington               206.54416    1.59247 129.701  < 2e-16 ***
## weekdayTuesday                -0.83838    1.91423  -0.438 0.661410    
## weekdayWednesday               3.10669    1.89062   1.643 0.100348    
## weekdayThursday                1.83079    1.88705   0.970 0.331959    
## weekdayFriday                  5.14735    1.90212   2.706 0.006811 ** 
## weekdaySaturday               14.49074    2.73438   5.299 1.17e-07 ***
## weekdaySunday                  9.62622    2.73897   3.515 0.000441 ***
## rushhour1                           NA         NA      NA       NA    
## weekend_holiday1              -7.89313    2.23236  -3.536 0.000407 ***
## feellike                      26.40220    3.29396   8.015 1.13e-15 ***
## seasonSpring                   3.26512    1.69344   1.928 0.053850 .  
## seasonSummer                  -0.05362    2.08692  -0.026 0.979502    
## seasonWinter                 -15.12865    2.06506  -7.326 2.42e-13 ***
## BeautifulWeatherTrue           5.08320    1.32366   3.840 0.000123 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 97.82 on 39931 degrees of freedom
## Multiple R-squared:  0.4769, Adjusted R-squared:  0.4761 
## F-statistic: 627.5 on 58 and 39931 DF,  p-value: < 2.2e-16
BIC_model <- lm(total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
    Mean.Humidity + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + 
    new_precipitation + City + feellike + season + BeautifulWeather, data = na.omit(linearreg_train))
summary(BIC_model)
## 
## Call:
## lm(formula = total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + 
##     new_precipitation + City + feellike + season + BeautifulWeather, 
##     data = na.omit(linearreg_train))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -227.74  -48.51  -14.50   30.39  912.98 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -299.5995    15.6944 -19.090  < 2e-16 ***
## hour1                        -28.0048     4.6400  -6.035 1.60e-09 ***
## hour2                        -39.9391     4.9702  -8.036 9.56e-16 ***
## hour3                        -71.0196     5.5378 -12.824  < 2e-16 ***
## hour4                        -66.7942     5.3635 -12.453  < 2e-16 ***
## hour5                         -5.3131     4.4740  -1.188   0.2350    
## hour6                         37.4923     4.0597   9.235  < 2e-16 ***
## hour7                         87.6110     3.8486  22.764  < 2e-16 ***
## hour8                        116.2905     3.7467  31.038  < 2e-16 ***
## hour9                         86.8799     3.7587  23.114  < 2e-16 ***
## hour10                        77.5900     3.7778  20.538  < 2e-16 ***
## hour11                        87.0558     3.7693  23.096  < 2e-16 ***
## hour12                        97.9970     3.7739  25.967  < 2e-16 ***
## hour13                        97.2500     3.7679  25.810  < 2e-16 ***
## hour14                        94.7294     3.7862  25.020  < 2e-16 ***
## hour15                       100.1506     3.7682  26.578  < 2e-16 ***
## hour16                       112.4641     3.7315  30.139  < 2e-16 ***
## hour17                       138.4482     3.7017  37.401  < 2e-16 ***
## hour18                       129.4917     3.7196  34.814  < 2e-16 ***
## hour19                       103.4653     3.7893  27.305  < 2e-16 ***
## hour20                        79.7041     3.8693  20.599  < 2e-16 ***
## hour21                        60.9151     3.9436  15.446  < 2e-16 ***
## hour22                        42.7005     4.0374  10.576  < 2e-16 ***
## hour23                        22.3699     4.1526   5.387 7.21e-08 ***
## Subscription.TypeRegistered   79.3157     1.0820  73.308  < 2e-16 ***
## Mean.TemperatureF            -20.4128     2.7391  -7.452 9.36e-14 ***
## Mean.Humidity                -51.4137     6.5054  -7.903 2.79e-15 ***
## Mean.VisibilityMiles           2.6898     0.5197   5.176 2.28e-07 ***
## Mean.Wind.SpeedMPH            -0.6479     0.1634  -3.965 7.36e-05 ***
## new_precipitation             -7.7999     1.9507  -3.998 6.39e-05 ***
## CityArlington                 32.8642     1.6355  20.094  < 2e-16 ***
## CityBethesda                 -10.0070     1.8895  -5.296 1.19e-07 ***
## CityChevy Chase              -25.6455     3.1798  -8.065 7.52e-16 ***
## CityDerwood                  -53.4749     5.6761  -9.421  < 2e-16 ***
## CityRockville                -25.0290     2.4579 -10.183  < 2e-16 ***
## CitySilver Spring            -26.2259     2.3023 -11.391  < 2e-16 ***
## CityTakoma Park              -21.3188     2.4004  -8.881  < 2e-16 ***
## CityWashington               206.3001     1.5929 129.513  < 2e-16 ***
## feellike                      24.6193     3.1507   7.814 5.67e-15 ***
## seasonSpring                   2.8306     1.6102   1.758   0.0788 .  
## seasonSummer                  -1.0607     2.0147  -0.526   0.5986    
## seasonWinter                 -16.2409     1.9760  -8.219  < 2e-16 ***
## BeautifulWeatherTrue           5.0319     1.2534   4.015 5.97e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 97.88 on 39947 degrees of freedom
## Multiple R-squared:  0.476,  Adjusted R-squared:  0.4755 
## F-statistic:   864 on 42 and 39947 DF,  p-value: < 2.2e-16
AIC_model <- lm(total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
    Mean.Humidity + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + 
    new_precipitation + CloudCover + City + weekday + weekend_holiday +feellike + 
    season + BeautifulWeather, data = na.omit(linearreg_train))
summary(AIC_model)
## 
## Call:
## lm(formula = total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
##     Mean.Humidity + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + 
##     new_precipitation + CloudCover + City + weekday + weekend_holiday + 
##     feellike + season + BeautifulWeather, data = na.omit(linearreg_train))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -227.26  -48.35  -14.46   30.26  914.03 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -312.7756    16.6262 -18.812  < 2e-16 ***
## hour1                        -28.1794     4.6381  -6.076 1.25e-09 ***
## hour2                        -40.1420     4.9682  -8.080 6.67e-16 ***
## hour3                        -70.9311     5.5362 -12.812  < 2e-16 ***
## hour4                        -66.7667     5.3611 -12.454  < 2e-16 ***
## hour5                         -4.7441     4.4736  -1.060 0.288929    
## hour6                         37.7392     4.0590   9.298  < 2e-16 ***
## hour7                         88.0400     3.8492  22.872  < 2e-16 ***
## hour8                        116.7162     3.7466  31.152  < 2e-16 ***
## hour9                         87.0766     3.7574  23.174  < 2e-16 ***
## hour10                        77.7444     3.7762  20.588  < 2e-16 ***
## hour11                        87.2682     3.7675  23.163  < 2e-16 ***
## hour12                        98.1012     3.7719  26.009  < 2e-16 ***
## hour13                        97.4979     3.7660  25.889  < 2e-16 ***
## hour14                        94.7901     3.7843  25.049  < 2e-16 ***
## hour15                       100.3806     3.7664  26.651  < 2e-16 ***
## hour16                       112.7349     3.7298  30.225  < 2e-16 ***
## hour17                       138.7788     3.7009  37.498  < 2e-16 ***
## hour18                       129.7587     3.7191  34.890  < 2e-16 ***
## hour19                       103.6887     3.7882  27.371  < 2e-16 ***
## hour20                        79.9594     3.8682  20.671  < 2e-16 ***
## hour21                        61.2126     3.9424  15.527  < 2e-16 ***
## hour22                        42.9181     4.0362  10.633  < 2e-16 ***
## hour23                        22.6577     4.1510   5.458 4.83e-08 ***
## Subscription.TypeRegistered   79.5258     1.0846  73.325  < 2e-16 ***
## Mean.TemperatureF            -21.6258     2.8410  -7.612 2.76e-14 ***
## Mean.Humidity                -42.0769     7.4646  -5.637 1.74e-08 ***
## Mean.VisibilityMiles           2.6222     0.5272   4.974 6.60e-07 ***
## Mean.Wind.SpeedMPH            -0.5222     0.1695  -3.082 0.002061 ** 
## new_precipitation             -7.5003     2.0009  -3.748 0.000178 ***
## CloudCover1                   -5.1220     4.5780  -1.119 0.263229    
## CloudCover2                   -2.5793     4.5004  -0.573 0.566553    
## CloudCover3                    2.2890     4.2183   0.543 0.587393    
## CloudCover4                    0.2518     4.3042   0.058 0.953355    
## CloudCover5                    1.9416     4.1927   0.463 0.643299    
## CloudCover6                    3.1763     4.2279   0.751 0.452497    
## CloudCover7                    0.8579     4.1787   0.205 0.837343    
## CloudCover8                   -4.7795     4.3894  -1.089 0.276212    
## CityArlington                 32.9744     1.6346  20.172  < 2e-16 ***
## CityBethesda                  -9.9547     1.8886  -5.271 1.36e-07 ***
## CityChevy Chase              -25.7636     3.1786  -8.105 5.40e-16 ***
## CityDerwood                  -52.9329     5.6773  -9.324  < 2e-16 ***
## CityRockville                -24.7075     2.4589 -10.048  < 2e-16 ***
## CitySilver Spring            -26.1301     2.3015 -11.354  < 2e-16 ***
## CityTakoma Park              -21.1539     2.4000  -8.814  < 2e-16 ***
## CityWashington               206.5479     1.5925 129.703  < 2e-16 ***
## weekdayTuesday                -0.9617     1.9103  -0.503 0.614657    
## weekdayWednesday               3.0597     1.8900   1.619 0.105490    
## weekdayThursday                1.8053     1.8869   0.957 0.338694    
## weekdayFriday                  5.1011     1.9016   2.683 0.007309 ** 
## weekdaySaturday               14.3692     2.7317   5.260 1.45e-07 ***
## weekdaySunday                  9.5128     2.7366   3.476 0.000509 ***
## weekend_holiday1              -7.6996     2.2240  -3.462 0.000537 ***
## feellike                      25.9732     3.2660   7.953 1.87e-15 ***
## seasonSpring                   3.1008     1.6855   1.840 0.065814 .  
## seasonSummer                  -0.4458     2.0499  -0.217 0.827847    
## seasonWinter                 -15.4146     2.0452  -7.537 4.92e-14 ***
## BeautifulWeatherTrue           5.1726     1.3207   3.917 8.99e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 97.82 on 39932 degrees of freedom
## Multiple R-squared:  0.4768, Adjusted R-squared:  0.4761 
## F-statistic: 638.5 on 57 and 39932 DF,  p-value: < 2.2e-16

We then used each model to make prediction on test dataset and analyzed performance based on MAE and MRSE.

Adjusted R Squared: MAE of 60.25 and MRSE of 102.0942

BIC: MAE of 60.28 and MRSE of 102.094

AIC: MAE of 60.25 and MRSE of 102.1148

Additionally all three models resulted in an Adjusted R Squared of ~47%, which likely explains why all three models performed nearly the same.

mfull_pred <- predict(m_full_linear,linearreg_test)
## Warning in predict.lm(m_full_linear, linearreg_test): prediction from a
## rank-deficient fit may be misleading
linearreg_test$total_rides_mfull_pred=mfull_pred
BIC_pred <- predict(BIC_model,linearreg_test)
linearreg_test$total_rides_BIC_pred=BIC_pred
AIC_pred <- predict(AIC_model,linearreg_test)
linearreg_test$total_rides_AIC_pred=AIC_pred
MAE <- function(actual,predicted){
        mean(abs(actual - predicted), na.rm = TRUE)
}
MAE(linearreg_test$total_rides,linearreg_test$total_rides_mfull_pred)
## [1] 59.06931
MAE(linearreg_test$total_rides,linearreg_test$total_rides_BIC_pred)
## [1] 59.07309
MAE(linearreg_test$total_rides,linearreg_test$total_rides_AIC_pred) 
## [1] 59.07193
rmse(linearreg_test$total_rides,linearreg_test$total_rides_mfull_pred)
## [1] 99.52303
rmse(linearreg_test$total_rides,linearreg_test$total_rides_AIC_pred)
## [1] 99.52603
rmse(linearreg_test$total_rides,linearreg_test$total_rides_BIC_pred)
## [1] 99.50097
\section{Model 2 - Regression Tree Model}

Regression Tree Model (using similar variables of AIC model)

rpart_model <- rpart(total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
    Mean.Humidity + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + 
    new_precipitation + CloudCover + City + weekday + weekend_holiday +feellike + 
    season + BeautifulWeather, na.omit(linearreg_train))
#Plot the tree
rpart.plot(rpart_model,digits = 3,fallen.leaves = TRUE,type = 4) # Plot the tree

fancyRpartPlot(rpart_model)

Predict on test dataset for regression tree model

rpart_pred<-predict(rpart_model,linearreg_test)
linearreg_test$total_rides_rpart_pred=rpart_pred # Save the predictions as variable total_rides_pred_rpart on test

Measure model performance with Mean Absolute Error (MEA) to evaluate the model

MAE(linearreg_test$total_rides,linearreg_test$total_rides_rpart_pred)
## [1] 22.74462

Measure model fit with Root Mean Square Error (RMSE) to evaluate the standard deviation of the model prediction error. A smaller value indicates better model performance.

rmse(linearreg_test$total_rides,linearreg_test$total_rides_rpart_pred)
## [1] 53.34946
\section{Model 3 - Random Forest Model}

Random Forest Model (using similar variables of AIC model)

set.seed(123)
rf_model <- randomForest(total_rides ~ hour + Subscription.Type + Mean.TemperatureF + 
    Mean.Humidity + Mean.VisibilityMiles + Mean.Wind.SpeedMPH + 
    new_precipitation + City + feellike + season + BeautifulWeather, data=linearreg_train,importance=TRUE,na.action=na.omit)
rf_model
## 
## Call:
##  randomForest(formula = total_rides ~ hour + Subscription.Type +      Mean.TemperatureF + Mean.Humidity + Mean.VisibilityMiles +      Mean.Wind.SpeedMPH + new_precipitation + City + feellike +      season + BeautifulWeather, data = linearreg_train, importance = TRUE,      na.action = na.omit) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 3
## 
##           Mean of squared residuals: 2327.949
##                     % Var explained: 87.25
plot(rf_model, main="Random Forest") # Plot model accuracy by class

importance(rf_model) # Look variable importance
##                        %IncMSE IncNodePurity
## hour                 193.39113     162055422
## Subscription.Type    243.99614      77937364
## Mean.TemperatureF     43.93934      20327312
## Mean.Humidity         56.23444      19899546
## Mean.VisibilityMiles  39.10003       6091913
## Mean.Wind.SpeedMPH    44.85421      13628751
## new_precipitation     40.54293       9670355
## City                 324.03633     317222045
## feellike              51.47854      26903485
## season                48.20983      12539232
## BeautifulWeather      28.24067       3528786
varImpPlot(rf_model, main="Random Forest by variable importance")

Predict on test dataset for Random Forest

rf_pred<-predict(rf_model,linearreg_test)
linearreg_test$total_rides_rf_pred=rf_pred # Save the predictions as variable total_rides_pred_rf on test

Measure model performance with Mean Absolute Error (MEA) to evaluate the model

MAE(linearreg_test$total_rides,linearreg_test$total_rides_rf_pred)
## [1] 17.87722

Measure model fit with Root Mean Square Error (RMSE) to evaluate the standard deviation of the model prediction error. A smaller value indicates better model performance.

rmse(linearreg_test$total_rides,linearreg_test$total_rides_rf_pred)
## [1] 48.67429

Save all the predictions by day and hour

linearreg_test[is.na(linearreg_test)]<-0
## Warning in `[<-.factor`(`*tmp*`, thisvar, value = 0): invalid factor level,
## NA generated
predictions_df<-as.data.frame(linearreg_test) %>%
  group_by(date,hour)%>%
  summarise(real=sum(total_rides),
            predictions_mfull=sum(total_rides_mfull_pred),
            predictions_bic=sum(total_rides_BIC_pred),
            predictions_aic=sum(total_rides_AIC_pred),
            predictions_rpart=sum(total_rides_rpart_pred),
            predictions_rf=sum(total_rides_rf_pred))
head(predictions_df)
## Source: local data frame [6 x 8]
## Groups: date [1]
## 
##         date   hour  real predictions_mfull predictions_bic
##       <date> <fctr> <int>             <dbl>           <dbl>
## 1 2015-01-01      0    10          24.16185        31.16064
## 2 2015-01-01      1    12         -98.04242       -90.96437
## 3 2015-01-01      2     1        -142.99458      -135.76282
## 4 2015-01-01      3     1        -140.81580      -133.97916
## 5 2015-01-01      5     6          98.97683       105.16323
## 6 2015-01-01      8     1           0.00000         0.00000
## # ... with 3 more variables: predictions_aic <dbl>,
## #   predictions_rpart <dbl>, predictions_rf <dbl>
write.csv(predictions_df,file="predictions_df.csv",row.names=FALSE)
\section{4. Discussion}

Comparing the MAE and MRSE across the models that showcased that the Random Forest model provided the most accurate predictions of hourly ridership.

\section{References}

R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from http://www.R-project.org/.

\section{Appendix}\section{I. Authors' Individual Contribution}

Elvin did the data exploration, Hellen designed the Random Forest Model, Lee designed the Linear Models, and Tarek design the Regression Tree model. All the team members contributted to the data collection, preprocessing, and analysis.